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Abstract. We show how to combine Fourier analysis with coupling arguments to bound the mixing 
times of a variety of Markov chains. The mixing time is the number of steps a Markov chain takes 
to approach its equilibrium distribution. One application is to a class of Markov chains introduced 
by Luby, Randall, and Sinclair to generate random tilings of regions by lozenges. For an (.x £ region 
we bound the mixing time by 0{£* ^og£), which improves on the previous bound of 0{f), and we 
show the new bound to be essentially tight. In another application we resolve a few questions raised 
by Diaconis and Saloff-Coste by lower bounding the mixing time of various card-shuffling Markov 
chains. Our lower bounds are within a constant factor of their upper bounds. When we use our 
methods to modify a path-coupling analysis of Bubley and Dyer, we obtain an 0(n^ logn) upper 
bound on the mixing time of the Karzanov-Khachiyan Markov chain for linear extensions. 



1. Introduction 



Using a simple idea, we obtain improved upper and lower bounds on the mixing times of a 
number of previously studied Markov chains: 

• A lozenge is a rhombus with angles 120° and 60° 
and sides of unit length. Figure |l] shows a random 
lozenge tiling of a hexagon. Random lozenge tilings 
were originally studied in physics as a model for dimer 
systems, and have recently served as an exploratory 
tool by p eople in combinatorics. § [5| gives fur ther back- 
ground. Luby, Randall, and Sinclair (1995| ) proposed 
a Markov chain to generate random lozenge tilings of 
regions, proved that it runs in time O(n^) when there 
are n lozenges, and in later unpublished work reduced 
the bound to 0{n^'^). (They also analyzed domino- 
tiling and Eulerian-orientation Markov chains.) For 
the regular hexagon with side length i, for example, 
their methods give a bound of 0{i'^). We show here 
that the correct mixing time of this Markov chain is 
0(^^log£) = G(n^logn), by showing the state to be 
very far from stationarity after ~ (S/vr^)^^ log^ steps, 
and very close to stationarity after ~ (48/7r^)^^ log£ 
steps. The correct constant appears to be IG/vr^. For 
general regions of size n and width w, our upper bound 
is ~ (3/7r^)tt;^nlogn. 

• Consider the following shuffle on a deck of n cards: 
with probability 1/2 do nothing, otherwise transpose a random adjacent pair of cards. How many 




Figure 1: Random lozenge tiling of the order 
10 hexagon, chosen uniformly at random from 
the 9265037718181937012241727284450000 possi- 
ble such tilings. Here £ — 10, w = 20, and n — 300. 



of these operations are needed before the deck becomes well-shuffled? In the years since Aldous 
1983, sect. 4) showed that O(n^logn) shuffles are enough and that Q(n^) shuffles are necessary 



to randomize the deck, there have been a couple of heuristic arguments ( [Aldous, 1997 ) (Diaconis 
1997) for why Q{n^ logn) shuffles should be necessary, but unruly technical difflculties prevented 
a rigorous proof from being written down. Using our method these technical difficulties vanish. 
With little more than algebra and trigonometry, we show that (l/vr^ — o(l))n^logn shuffles are 
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necessary to begin to randomize the deck, and that {^/ti"^ + o(l))n'^ logn shuffles are enough. The 
best previous pubhshed exphcit upper bound was (4 + o(l))n^ logn shuffles. The correct constant 
appears to be l/vr^. 

We lower bound the mixing time of a few of other shuffles analyzed by piaconis and Saloff- 



Coste (1993a). They considered shuffles for which the cards appear on the vertices of a graph, and 
the shuffle picks a random edge and transposes the cards at its endpoints. They obtained upper 
bounds on the mixing time for shuffles based on the ^/n x y/n grid and on the (log2 n)-dimensional 
hypercube, but did not have matching lower bounds. We provide lower bounds, showing that their 
upper bounds are correct to within constant factors. 



Counting the number of linear extensions of a partially ordered set is #P-complete ( Brightwell 



and Winkler, 1991 ), making it computationally intractible. (One application of counting linear 
extensions is in a data mining application that infers partial orders ( Mannila and Meek, 2000| ).) 
One can approximately count linear extensions by randomly generating them, so there have been a 
number of articles on generating random linear extensions. The latest, by |Bubley and Dyer (199^ ), 
proposes a Markov chain in which pairs of elements in the linear extension are randomly transposed 
if doing so respects the partial order. To make the analysis easy, their Markov chain selects a random 
site with probability proportional to parabolic curve, and then attempts to transpose the elements 
at this random site. We show here that the uniform distribution on sites works as well, obtaining 
a constant factor that is only marginally worse. 

We give further background on these various Markov chains in later sections where we analyze 
them. In § H we provide basic definitions, such as what it means formally for the state of a Markov 
chain to be close to random. We study in § ^ a Markov chain for generating random lattice paths, 
since it is simple, yet illustrates the key ideas that we will use to analyze the various other Markov 
chains. We use Fourier analysis to define on the state space of the Markov chain a function $ that 
has a certain contraction property. With S denoting the current state of the Markov chain, and 
the random variable S' denoting the next state of the chain, we have £'[$(S")|S'] = (1 — ^)^{S). 
We derive both upper bounds and lower bounds using this contraction property. After § ^ the 
remaining sections may be read in any order. We see in § Q how to apply the results about the path 
Markov chain to the chain for shuffling by random adjacent transpositions. In § ^ we generalize 
the upper bound for the path Markov chain to upper bound the mixing time of the lozenge-tiling 
Markov chain introduced by Luby, Randall, and Sinclair. In § ^ we modify Bubley and Dyer's path- 
coupling analysis of the Karzanov-Khachiyan Markov chain to obtain the O(n^logn) mixing time 
bound. When using a local randomizing operation to update a high dimensional configuration, 
one typically either updates a random coordinate each step, or else updates the coordinates in 
a systematic order. In § ^ we compare these two methods for the chains that we are studying; 
our analysis indicates that the second method is better. We take a second look at the lattice 
path and permutation Markov chains in § |8|, and refine our previous arguments to obtain tighter 
constants. We consider exclusion and exchange processes in § ^, where among other things we 
resolve the aforementioned questions of Diaconis and Saloff-Coste. Many of the mixing time upper 
and lower bounds we give differ by small constant factors. We give in § |l^ heuristic arguments and 
present experimental evidence for determining the correct constant factors in the mixing times. We 
summarize m Table | many of these mixing time bounds and their (conjectural) correct values. 
§ 10 also contains several open problems for further research. We make some concluding remarks 
in § TH. 
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State space, Markov chain 


parameter 


(conjectural) 
correct answer 


rigorous 
lower bound 


rigorous 
upper bound 


Paths in n/2 x n/2 box, random adjacent transpositions 


couphng threshold 
variation threshold 
separation threshold 

spectral gap 


2l'K^v? logn 
l/vr^n^ logn 
logn 
1 — cos(7r/n) 


logn 
l/7r2n^ logn 
l/7r2n^ logn 
1 — cos(7r/n) 


logn 
2/i\:'^n^ logn 
4/7r2n^ logn 
1 — cos(7r/n) 


n — 1 


n — 1 


n — 1 


Permutations — S'„,, random adjacent transpositions 


coupling threshold 
variation threshold 
separation threshold 

spectral gap 


A^j-K^v? logn 
l/vr^n^ logn 
log n 
1 — cos(7r/n) 


2/-K'^n^ logn 
l/7r2n^ logn 
l/7r2n'^ logn 
1 — cos(7r/n) 


4/7r2n'^ logn 
2lTx^r? logn 
4/7r2n'^ logn 
1 — cos(7r/n) 


n — 1 


n — 1 


n — 1 


Lozenge tilings of order £ hexagon, Luby- Randall-Sinclair chain 


coupling threshold 
variation threshold 
separation threshold 

spectral gap 


?? 

1 -cos(7r/(2^)) 
i{2l - 1) 


8/7r2^^ log^ 
8/7r2^^ log I 
8/7r2^^ log I 
1 -cos(7r/(2^)) 

£(2£- 1) 


48/7r2^4 iQg^ 
48/7r2^4 iQg^ 
96/7r2^4 iQg^ 

1 - cos(7r/(2^)) 
l{2i - 1) 


Linear extensions of partially ordered set, Karzanov-Khachiyan chain 


variation threshold 
spectral gap 


depends on poset 
depends on poset 


sometimes tight to 
within constants 

1 — cos(7r/n) 
n — 1 


4/7r2n^ logn 
sometimes tight 


Lozenge tilings of region of n triangles and width w, Luby-Randall-Sinclair chain 


variation threshold 
spectral gap 


depends on region 
depends on region 


sometimes tight to 
within constants 

1 — cos(7r/^i;) 
n 


3/7r2tt;2nlogn 

soinetimes tight to 
within constants 



Table 1. Summary of mixing time bounds for several classes of Markov chains 
considered here. The variation and separation thresholds are defined in § ^. The 
bounds for lattice paths and permutations are proved in § |3| and § ^ the bounds for 
lozenge tilings are proved in § ^, and the bounds for the Karzanov-Khachiyan chain 
are proved in § & The coupling times are for the natural monotone grand couplings 
described in § ^ § and § ^. The conjectural correct answers are derived in § |l^. 
The spectral gap for permutations was previously known (Diaconis, unpublished). 



2. Preliminaries 

Here we review some basic definitions and properties pertaining to mixing times and couplings. 
For a more complete introduction to these ideas see Aldous (1983| ) or Aldous and Fill (200X|) . 



When a Markov chain is started in a state x and run for t steps, we denote the distribution 
of the state at time t by Pj, where P is the state transition matrix of the Markov chain. If the 
Markov chain is connected and aperiodic, then as t — > oo the distribution P^. converges to a unique 
stationary distribution which is often denoted by tt. Since we will use tt to denote the ratio of the 
circumference of a circle to its diameter, here we use [i to denote the stationary distribution. In all 
our examples the stationary distribution is the uniform distribution, which we denote with U . 
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To measure the distance between the distributions P* and fi one usually uses the total variation 
distance. With X denoting the state space, the variation distance is defined by 

\\P^ - ^\\tv = max |Pi(A) - ^{A)\ = ^ ^ \P',{y) - ^{y)\ = ^\\P', - , 

The variation distance when the chain is started from the worst start state is denoted by 

d{t) = max \\P^ — ^IItv , 

X 

though it is often more convenient to work with 



d{t) = max - P; 



x,y 



I TV 



since d{t) is submultiplicative whereas d{t) is not. It is easy to see that d{t) < d{t) < 2d{t). 

The (variation) mixing time is the time it takes for d{t) to "become small" (say less than 1/e). 
It is a surprising fact that for many classes Markov chains there is a threshold time T such that 
d{{l — e)T) > 1 — £ but d[{\ + £)T) < e, where e tends to as the "size" of the Markov chain gets 
large; see piaconis (199^ ) for a survey of this "cutoff phenomenon." 

Most of our mixing time upper bounds are derived via coupling arguments. In a (pairwise) 
coupling there are two copies Xt and Yj of the Markov chain that are run in tandem. The Xt^s by 
themselves follow the transition rule of the Markov chain, as do the It's, but the joint distribution 
of (Xj_|_i, given {Xt, Yt) is often contrived to make the two copies of the Markov chain quickly 
coalesce (become equal). It is a standard fact that 

d{t) < maxPr[Xt / Yt\Xo = x,Yo = y] , 
x,y 

so that a coupling which coalesces quickly can give us a good upper bound on the mixing time. Most 
of the variation threshold upper bounds in Table [l| follow from this relation and a corresponding 
coupling time bound, and lower bounds on the coupling time likewise follow from corresponding 
lower bounds on the variation threshold time. The remaining variation threshold upper bounds 
and coupling time lower bounds in Table |l| that do not follow from this relation are derived in § |^. 

Many pairwise couplings can be extended to "grand couplings," where at each time step there 
is a random function Ft defined on the whole state space of the Markov chain, and Xt+i = Ft{Xt) 
and It+i = F'tiYt) for any Xt and Yt. For example, if the state space is the set of permutations 
on n cards, then the update rule "pick a random adjacent pair of cards, and flip a coin to decide 
whether to place them in increasing order or decreasing order" defines a grand coupling; the choice 
of the adjacent pair and the value of the coin flip define the random function on permutations. In 
§ ^ and § |8| we will also consider pairwise couplings that do not extend to grand couplings. 

All of the grand couplings considered in this article are monotone, which is to say that there is 
a partial order ^ such that if x ^ y then also Ft{x) < Ft{y). All of the partial orders considered 
here have a maximal element, denoted 1, and a minimal element, denoted 0, i.e. so that ^ x ^ 1 
for each x in the state space. Monotone grand couplings are particularly convenient for algorithms 



(see e. gjropp and Wilson (19961) or [Fill (1998D ). 

In § 10 we consider not only the variation distance, but also the separation distance, which is 
defined by 

sit) = max . 

^,y n{y) 

The function s{t) is also submultiplicative, and also often exhibits a sharp threshold. In general 
d{t) < s{t), and for reversible Markov chains s{2t) < 2d{t) - d{tf (see (|Aldous and Fill, 200X 



Chapt. 4, Lemma 7)). The rigorous bounds in Table || pertaining to separation distance follow 
from these relations and the corresponding bounds for the variation distance. 
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3. Lattice path Markov chain 




Figure 2: 

with the 1 
4, & = 5, n 
011011100. 



A lattice path through the 4x5 rectangle, 
X 1 boxes underneath it shaded. Here a = 
9, and the lattice path's encoding is 



Consider an a x & rectangle composed of 1 x 1 
boxes rotated 45°, so that the sides of length a 
are oriented northwest /southeast. A lattice path 
(see Figure is a traversal from the left-most 
corner to the rightmost corner of the rectangle, 
traveling along the borders of the 1x1 boxes, so 
that each move is either up-and-right or down-and- 
right. Such lattice paths can be encoded as strings 
of length a + 6 consisting of a O's (down moves) 
and h I's (up moves). There are n!/(a!W) such lat- 
tice paths, where for convenience we let n = a + h. 
These lattice paths correspond to sets of 1 x 1 boxes 
which are "stable under gravity," i.e. no box lies 
above an empty cell. 

Consider the following Markov chain for ran- 
domly generating a lattice path between the oppo- 
site corners of the a x b rectangle. Given a path, 
the Markov chain randomly picks one of the (n — 1) 
internal columns (we assume n > 2), and then ran- 
domly decides whether to try pushing the path up 
at that point, or to try pushing it down. If pushing the path up (or down) would result in an 
invalid path, the Markov chain simply sits idle during that step. Equivalently, in the binary string 
representation, the Markov chain picks a random adjacent pair of letters in the string, and then 
randomly decides to either sort them or reverse-sort them. Understanding this Markov chain will 
be instrumental to understanding the Markov chains for random lozenge tilings, for card shuffling 
by random adjacent transpositions, and for other types of card shuffling. 

3.1. Contraction property. We will analyze this Markov chain by measuring 1) the "displace- 
ment" of a given single path on the axb square from "equilibrium" , and 2) the "gap" between two 
such paths when one of the paths is entirely above the other. We lower bound the mixing time by 
computing the displacement of a path when not enough steps have been taken, and showing that 
it is typically different from the displacement of a random path. We upper bound the mixing time 
by showing that after enough steps, starting from the top and bottom paths, the expected gap is 
so small that they have almost surely coalesced to the same path. 

It will be useful later to use horizontal coordinates that range from —n/2 to n/2. Let h{x) denote 
the height of the path at position x relative to the line connecting the opposite corners of the box. 
That is, h{x) is the number of up moves to the left of position x minus the expected number of 
such up moves. Thus h{—n/2) = h{n/2) = 0, and h{x) = h{x — 1) + a/n if there was an up move 
between x — 1 and x, or else h{x) = h{x — 1) — 6/n if there was a down move. For the example in 
Figure the heights change by +4/9 = a/{a + b) for up moves and —5/9 = —b/{a + b) for down 
moves, and are as follows: 



X 


-9/2 -7/2 


-5/2 


-3/2 


-1/2 


1/2 


3/2 


5/2 


7/2 9/2 


h{x) 


-5/9 


-1/9 


3/9 


-2/9 


2/9 


6/9 


10/9 


5/9 



The displacement function <I> of /i that we will find useful is 



n/2 



X COS ■ 



Px 



-n/2 



n 



(1) 
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where < /3 < vr. This function weighs deviations from expectation more heavily near the middle 
of the path than near its endpoints. Given two lattice paths with height functions h and h, where 
h{x) < h{x) for all x, define the gap function to be /i — and the gap to be 

h) = ^{h -h) = ^ih) - ^{h) . 

Note that since < /? < vr, the gap is strictly positive when the paths h and h differ, and is 
otherwise. After the Markov chain has equilibrated, so that each path is equally likely, E[h[x)] = 0, 
so the expected displacement is = 0. 

Lemma 1. Let the displacement function $ he defined by Equation (|l|). Suppose h is a height 
function (so ^{h) is the displacement) and (3 = tt, or h = h — h is a gap function (so ^{h) is the 
gap) and < /? < vr. Let h' be the height or gap function after one step of the Markov chain. Then 

Emh')-m\h]<=^^^^^^m, 

n — 1 

with equality when (3 = tt. The coefficient on the right-hand side is hounded by 

/?2 ^ -1 + cos(/?/n) ^ /32 



2n'^{n — 1) n — 1 2n^ 

Proof. Suppose we pick a site x, flip a coin, and adjust the height accordingly. Then the expected 
value of the new height at x is just [h{x + 1) + h{x — l)]/2. Assume that we pick each site (other 
than — n/2 and n/2) with probability 1/p, where p = n — 1 is the number of positions that can be 
picked. Then with primes denoting the updated variables, 

^r,// M,i P— 1,/ N 1 h(x + I) + h(x — 1) 

E[h'{x)h] = h{x) + ^ '—^ 

p p 2 

when —n/2 < x < n/2, so that 

n/2-l 

E[^{h')\h]= Yl E[h'{x)\h] COS— (2) 

x=-n/2+l 

Emh')\h] = ^^'^{h) + ^ y [h{x + l) + h{x-l)]cos— (3) 
n p ^-^ n 

x=-n/2+l 

Emh')-^{h)\h] = —^{h) + ^ Y h{y)cos — 

^ ^ -n/2+l<x<ri/2-l 

-n/2<y<n/2 
\x-y\=l 

= —m + — E My) cos ^ (4) 

P P , , n 

-n/2+l<x<n/2-l 
-n/2+l<y<n/2-l 
\x-y\=l 

<V'^W + V ^ %)cos^ (5) 

-n/2<x<n/2 
-n/2+l<y<n/2~l 
\x-y\=l 



-^m + ^-^ E My) 

■p p ^-^ 

y=-n/2+l 



cos h cos 



n n 
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= — + ^ y /i(y)2cos^cos^ (6) 
p p ^ n n 

y=-n/2+l 

^ -l + cos(/3/re) ^ ^ -l + cos(/3/n) ^ 

p n — 1 ' 

where we have used i?[/i'(ibn/2)] = in Equation (|2|), /i(ibn/2) = in Equation and the 
trigonometric identity cos{0 — (p) + cos{9 + cp) = 2 cos(0) cos((/)) in Equation (^). Inequahty (^) is 
justified if /? < vr, since by assumption h = h — h > for each x, and cos(/3/2) > 0; when /? = vr it 
becomes an equahty. 

To upper bound the right-hand side we use cos(x) < 1 — x^/2 + x'^/24: = 1 + (j;^/2)(— 1 + x^/12). 

-1 + cos(/3/n) ^ 1 / ^ ^ (/?/n)2 



n-1 ~ n-l 2 V 12 
- 2n3 ' 

where to get the last hne we used (1 — c/v?)/{n — 1) > 1/n when n > 1 and —c/n > —1. Here 
c < 7r^/12, so these conditions are satisfied whenever n > 1. The lower bound is somewhat easier: 
use the bound cos(x) > 1 — x^/2. □ 

3.2. Upper bound. 

Theorem 2. When n is large, after 

2 + o(l) ab 

r^n-^ log — 

TT^ e 

steps the variation distance from stationarity is e, and the probability that the two extremal paths 
have coalesced is 1 — e. (The o(l) term is a function of n alone.) 



1^'eisner and Wernisch (1997) reference an early version of this paper, since they used the upper 
bound in this theorem to bound the rate of convergence of the Karzanov-Khachiyan Markov chain 
for generating random linear extensions of a certain class of partially ordered sets. The Karzanov- 
Khachiyan Markov chain is discussed further in § ^ 

Proof of Theorem |^. To obtain the upper bound, we consider a pair of coupled paths ht and ht such 
that ho is the topmost path, ho is the bottommost path. The sequences ht and ht are generated 
by the Markov chain using "the same random moves", so that /it+i and ht+i are obtained from ht 
and ht respectively by sorting or unsorting (same random decision made in both cases) at the same 
random location x. One can check by induction that ht > ht- Let ^t = ^{ht — ht); $t = if and 
only if ht = ht- 

We compute -E[<I>t]; when it is small compared to the minimum possible positive value of <I>t, it 
will follow that with high probability ^t = 0. By choosing /3 to be slightly smaller than vr, we make 
this minimum positive value not too small, and thereby get a somewhat improved upper bound. 

From Lemma ^ and induction, we get 



E[<^t] < ^0 



1 



1 — cos(/3/n) 
n — 1 



1 1 



< $0 exp 



'2n3 



But E[<^>t] > Pr[^>i > 0]^>min- Thus after t > (2//32)n3 log($o/(^mme)) steps Fi[<^t > 0] < e. We 
have $0 < o.b, and <I>mm = cos(/3(n/2 — l)/n) > cos(/3/2) « (vr — (3)/2. The optimal choice of /? is 
TT — B(l/logn), but all that matters is that vr — /3^0asn^cxD while log(l/(vr — /?)) <^ log(a6). 
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Substituting, we find that t = (2/7r^ + ©(log log n/ log n))n^ log(a6/e) steps are enough to ensure 
that the probability of coalescence is at least 1 — e. □ 

Remark: We will show in § ^ that when a = b = n/2 the coupling time is actually (2/7r^)n^ log n. 
Remark: Proving mixing time upper bounds via a contraction property in the "distance" between 
configurations is a fairly standard technique. Traditionally the distance has been measured in terms 
of Hamming distance or other integer-valued distance, which for our applications does not yield 
the requisite contraction property. 

We get the spectral gap entries in Table |^ using similar reasoning (see also, e.g., [Chen (1998| )): 

Proposition 3. // a function $ is strictly monotone increasing in the partial order of a reversible 
monotone Markov chain with top and bottom state, and whenever X < Y we have E[^{Y') — 
(^{X')\X,Y] < (1 - 7)($(y) - ^{X)), then the spectral gap must be at least 7. 

Proof. Perturb the stationary distribution by an eigenvector associated with the second largest 
eigenvalue A, and run the Markov chain starting from this distribution. After t steps the vari- 
ation distance from stationarity is AA*. If we run the Markov chain starting from the top and 
bottom states, after t steps the states are different with probability at most ('&(!) — <I>(0))(1 — 
/ m.\nx^Y{^iX) — ^{X)). The coupling time bound on the variation distance gives A < 1 — 7. □ 

3.3. Lower bound. We obtain a lower bound on the mixing time when the rectangle is not too 
narrow: 

Theorem 4. //min(a, 6) 3> 1, then after 

^ n logmin(a,oj 

steps the variation distance from stationarity is 1 — o(l). 

We use Lemma |l] with /3 = vr so that we get an exact expression for -E[$t]- Then we bound 
Var[$t] to show that the distribution of $t is sharply concentrated about its expected value. When 
$t and $ 00 are sharply concentrated about values that are far enough apart, the chain is far from 
equilibrium. (This second-moment approach was also used by Diaconis and Shahshahani (1987| ) to 
lower bound the mixing time of random walk on Z2, and by Lee and Yau (1998 ) to lower bound 
the mixing time of the exclusion process on a circle.) The following technical lemma formalizes this 
argument, and is used to derive the mixing time lower bounds we give in this article, except the 



bound proved in § 9.4, where we need a generalization. (The coupling time lower bound proved in 



§ uses a different approach altogether.) 

Lemma 5. // a function $ on the state space of a Markov chain satisfies E[^{XtJ^i)\Xt\ = (1 — 
^)<^{Xt), and £'[(A$)^|Xj] < R where A<I> = <I>(Xt+i) - ^{Xt), then when the number of Markov 
chain steps t is bounded by 

^ ^ log ^max + I log ^ 
- log(l - 7) 

and < 7 < 2 — \/2 = 0.58 (or else < 7 < 1 and t is odd), then the variation distance from 
stationarity is at least 1 — e. 

Before proving Lemma |5|, we show how to use it to prove Theorem ^. 

Proof of Theorem 0. By Lemma |l], our function ^ satisfies the contraction property required by 
Lemma ^ when /3 = vr, and 

1 — cos(7r/n) - , VT'^ 
7 = and . 

The constraint on 7 is satisfied when n > 3. 
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To get a bound R, observe that any path h can have at most 2 min(a, b) local extrema, so 
Pr[A$ 7^ 0] < min(a,6)/(n - 1). But |A4>| < 1, so ma^h E[{A^{h)f] < min(a,6)/(n - 1) = R. 

The maximal path maximizes giving $0 = @{ab). Substituting into Lemma ^ and simplifying, 
7/i? ~ l/(n-^ min(a, 6)), so the numerator becomes log(a6) — logn — (1/2) log min(a, 6) + 0(1) = 
log min(a, 5) + 0(1) for bounded values of e, giving our lower bound of (l/vr-^ — o{l))n^ log min(a, b). 

□ 

Proof of Lemma Let $j = $(Xf). By induction 

S[$t)l^o] = $0(1-7)* • 
By our assumptions on 7, in equilibrium E[^] = 0. 
With A<I> denoting ^t+i — <l*t, we have 

E[$2^l|«>t] = (1 - 27)«>2 + ^[(A$)2|$t] < (1 - 27)^2 + R , 



and so by induction, 



< $2(1 - 27)* + ^ , 



then subtracting El^t]"^, 

Var[$,]<c|.2[(i_ 27)* -(1-7)2*] + ^ 

Var[$t] < ^ 
27 

for each t. To get the last line we used our constraints on 7 and t: (1 — 7)2 = 1 — 27 + 72 >1 — 27, 
so when t is odd, (1 — 7)^* > (1 — 27)*. When t is even, we need (1 — 7)^ > 27 — 1 as well, which 
is satisfied when 7 < 2 — ^/2 or 7 > 2 + -v/2- 
From Chebychev's inequality. 



Pr 



< e . 



\^t-E[^t]\ > VR/Wf^ 

As £'[^>oo] = 0, if E[^t] > \/4i?/(7e), then the probability that deviates below ^/rJ(^ is at 
most e/2, and the probability that $ in stationarity deviates above this threshold is at most e/2, 
so the variation distance between the distribution at time t and stationarity must be at least 1 — e. 
If we take the initial state to be the one maximizing then 

when 

log 



t < 



$max 



log(l -7) 



□ 



3.4. Intuition. Since the expected value of the new height function is a certain local average of 
the current height function, the evolution of the height function h (or rather its expected value) 
proceeds approximately according to the rule 

dh d'^h 

Since the equation is linear in h, it is natural to consider its eigenfunctions, which are just the sinu- 
soidal functions that are zero at the boundaries. We can decompose any given height function into 
a linear combination of these sinusoidal components, and consider the evolution of each component 
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independently. The displacement function <I> (when /? = vr) is just the coefficient of the principal 
mode (the sinusoidal function with longest period) when we decompose the height function in this 
way. The coefficient of the principal mode decays the most slowly, making it the most useful for 
purposes of establishing a lower bound. For purposes of establishing an upper bound, the difference 
between two height functions also obeys Equation (^. We used the fact that when two paths, one 
above the other, have the same displacement, then they are the same path. The coefficient of the 
principal mode is the only one for which we can guarantee this property, since the other eigenfunc- 
tions take on both positive and negative values in the interior. Thus the principal mode essentially 
controls the rate of convergence, making its coefficient a natural displacement function. 

4. Card shuffling by adjacent transpositions 

Next, we analyze the card shuffiing Markov chain that trans- 
poses random adjacent pairs of cards. (This Markov chain 
is a special case of the move-ahead-one update rule that 
has been studied in self-organizing linear search; Hester and| 



Hirschberg (1985 ) give a survey.) We will consider this 
Markov chain to be implemented according to the rule: pick ^ ^ ^ ^ ^ ^ Q 
a random adjacent pair + 1) of cards, flip a coin c, and 
then sort the items in that adjacent pair if heads, other- U J. J. J. J. J. U 
wise reverse-sort them. The same random update defined 0110110 
by i and c may be applied to more than one permutation to 0110100 
obtain coupled Markov chains. A permutation on the num- Q Q Q Q Q 
bers 1, . . . , n has associated with it n+ 1 threshold functions, ^ ^ ^ ^ ^ ^ 
where the ith threshold function (0 < i < n) is a string of* UUUU-LUU 
I's and n — i O's, with the I's at the locations of the i largest 
numbers of the permutation. The permutation can be recov- 9 653 7 41 
ered from these threshold functions simply by adding them 
up (see Figure III). When a random adjacent pair of numbers 
within the permutation are transposed, the effect on any 
given threshold function is to transpose the same adjacent Figure 3: The permutation 2653741 and its 
pair of O's and I's. The identity permutation 12 • • • n and its § thresliold functions siiown as lattice patlis. 
reverse n(n — 1) • • • 1 give the minimal and maximal paths for any threshold function. So when the 
coupled Markov chains started at these two permutations coalesce (take on the same value), the 
grand coupling would take any starting permutation to this same value. We can therefore use our 
analysis of the Markov chain on lattice paths to analyze the Markov chain on permutations. 

Theorem 6. After (l/vr^— o(l))n^ logn shujfles, the variation distance from stationarity is 1— o(l), 
and the probability of coalescence is o(l). After (6/7r^ + o(l))n^ log n shuffles, the variation distance 
from stationarity is o(l), and the probability of coalescence is 1 — o(l). 

We will prove better upper bounds § |6| and § 8A; the point here is to give a quick and easy proof. 




Proof of Theorem^. The lower bound comes from considering the [n/2jth threshold function. By 
Thereom^ after [l/vr^ — o(l)]n^ logn steps the variation distance of just this one threshold function 
from stationarity is 1 — o(l). The variation distance from stationarity of the permutation itself is 
at least as large. 

The upper bound follows from Theorem § when we take e = 6/n. As ab/{5/n) < after 
[2/tt'^ + o(l)]n^ log(n^/(5) steps the probability of any one given threshold function differing for the 
upper and lower permutations is < 5/n. The probability that the upper and lower permutations 
differ is at most the expected number of threshold functions for which they differ, which is at most 
6. Taking <5 <C 1 but log(l/5) <^ logn, after [G/vr^ + o(l)]n'^logn steps coalescence occurs with 
probability 1 — o(l). □ 
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5. Lozenge tilings 

5.1. Background. Random tilings, or equivalently random perfect matchings, were originally 
studied in the physics community as a model for dimer systems (see e.g. ( p^^isher, 1961| ) and ( Kaste- 



ieyn, 196^ ) and references contained therein). Physicists were interested in properties such as 



fong-range correlations in the dimers. In the case of lozenge tihngs, the dimers are the lozenges, 
and the monomers are the two regular triangles contained in a lozenge. In recent years mathemati- 
cians have been studying random lozenge tilings (among other types of random tilings), and have 
proved a number of difficult theorems about their asymptotic properties. For instance, when a very 
large hexagonal region is randomly tiled by lozenges, with high probability the tiling will exhibit a 
certain circular shape, and the density of each of the three orientations of lozeng function 
of position, is also known ( Pohn, Larsen, and Propp, 1998 ). (See also ( Pohn, Kenyon, and Prop"p 



2001).) Observations of random lozenge tilings of very large regions played an important role in 
the history of these theorems, since they indicated what sort of results might be true before they 
were proved, thereby guiding researchers in their efforts. 



Consequently, there have been several articles (Propp and Wilson, 1996|) ( Luby, Randall, and 



Sinclair, 19951 ) ( Wilson, 1997a ) ( Ciucu and Propp, 1996| ) on techniques to randomly generate lozenge 



tilings and other types of tilings. The first two of these articles use a Markov chain approach, while 



the second two use linear algebra. The first of these articles ( Propp and Wilson, 1996 ) introduces 
monotone-CFTP, which lets one efficiently generate random structures (e.g. lozenge tilings) using 
special Markov chains, without requiring any knowledge about the convergence rate of the Markov 



chain. It is the article by Luby, Randall, and Sinclair (199^ ) that is most relevant to us here. 



In it they introduce novel Markov chains for generating lozenge tilings (and two other types of 
structures). In this case knowledge of the mixing time of these Markov chains does not help with 
the specific task of random generation, as monotone-CFTP, which determines on its own how long 
to run a Markov chain, may be applied to each of these Markov chains. But there are still several 
reasons to determine the mixing time: (1) in the same way that designers of efficient algorithms like 
to prove that the algorithms actually are efficient, it is desirable to have a proof that the Markov 
chain is rapidly mixing, (2) there are physical interpretations of the mixing properties of dimer 
systems (see the discussion of [Destainville (2001D and references contained therein), and (3) there 
has been some speculation ( Propp, 1995-1997| ) that knowledge of the convergence properties of 



these Markov chains may be converted into knowledge about random tilings of the whole plane 
(but this remains to be seen). For these reasons, Luby, Randall, and Sinclair establish polynomial 
time bounds on the convergence rates of each of their Markov chains. In this section we substantially 
improve the analysis of the lozenge tiling Markov chain, and in many cases our bounds differ by 
just a constant factor from the true convergence rate. But as discussed in § |5.5| , Luby, Randall, and 
Sinclair also analyzed other Markov chains for which it is not clear how to apply the methods of 
this section. Nonetheless, empirical studies suggest that these other Markov chains converge about 
as quickly as the lozenge tiling Markov chain. 

There is a well-known correspondence between dimers on the hexagonal lattice, lozenge tilings, 
and nonintersecting lattice paths of the type we considered in § ^. Figure ^ illustrates this corre- 
spondence by showing a random perfect matching of a region of the hexagonal lattice, an equivalent 
random tiling of a related region by lozenges, and an equivalent random collection of nonintersect- 
ing lattice paths. Following Luby, Randall, and Sinclair we shall use the lattice path representation 
of lozenge tilings. 

5.2. Displacement function. In this section we apply the same techniques used to upper bound 
the mixing time of the lattice path Markov chain to upper bound the mixing time of a Markov 
chain for generating random lozenge tilings. There are several Markov chains for generating random 
lozenge tilings of regions (each of which possesses the monotonicity property required by monotone- 
CFTP) — the one that we shall analyze was introduced by Luby, Randall, and Sinclair. Two of 
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Figure 4. Shown starting in the upper left and proceeding clockwise are 1) a ran- 
dom perfect matching (every vertex paired with exactly one neighbor) of a portion 
of the hexagonal lattice, 2) the same perfect matching where edges of the matching 
shown as small lozenges, 3) same as 2, but the lozenges are large enough to touch 
one another, forming a lozenge tiling of a certain region, and 4) same as 3, but 
horizontal lozenges are represented as dots, while the other types are represented 
as ascending or descending line segments, which form nonintersecting lattice paths. 
These transformations are bijective, so that any set of nonintersecting lattice paths 
corresponds to a lozenge tiling which in turn corresponds to a perfect matching of 
the original hexagonal lattice graph. 
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these Markov chains use the lattice path representation of lozenge tilings, and may be viewed as 
generalizations of the lattice path Markov chain that we studied already. 

Consider the Markov chain that picks a random point on some lattice path, and then randomly 
decides whether to try pushing it up or down. The Markov chain is connected because the top 
path can be pushed to its maximum height, then the next highest path, and so on, so that each 
configuration can reach a unique maximal configuration. (Similarly, there is a unique minimal 
configuration.) The Markov chain is aperiodic since pushing the same lattice point twice in the 
same direction results in no change the second time. The Markov chain is symmetric, so its unique 
stationary distribution is the uniform distribution. 

Remark: Luby, Randall, and Sinclair assumed that the region to be tiled is simply connected, since 
otherwise the lattice paths cannot cross the interior holes in the region, causing the state space to 
be disconnected. But if we restrict the state space to configurations with a specified set of lattice 
paths passing under each interior hole, then this restricted state space is connected by these local 
moves. Our mixing time upper bounds will apply to each such connected component of the state 
space if the region to be tiled is not simply connected. 

Such is the "local moves" Markov chain for lozenge tilings. Unfortunately, it is difficult to analyze 
in the same way that we analyzed the path Markov chain because the paths must remain disjoint. 
We would like to define the displacement $ of the lozenge tiling to be the sum of the displacements 
of each of its paths, but computing ^^[A^] is difficult. We cannot compute the expected new height 
of a point on a lattice path simply by looking at the heights of its neighbors, since another lattice 
path may or may not be nearby and block its movement. Luby, Randall, and Sinclair introduced 
"nonlocal moves" to circumvent this problem and make a Markov chain that is more tractable. 

Consider a single lattice path site in isolation. Its height will change only if it is a local extremum 
and gets pushed up (if it's a minimum) or down (if it's a maximum). The idea behind the nonlocal 
moves is to preserve the expected change in height even if there are other lattice paths that might 
block the movement of this local extremum. If there are k paths blocking the movement of the local 
extremum, then with probability 1/(A; + 1) the corresponding points in the k + 1 paths are each 
moved. (Naturally if the border of the region itself prevents the movement of the paths, then the 
probability remains zero.) See Figure |5[ This modified Markov chain is ergodic and symmetric (the 
probability that these k + 1 paths get pushed back to their original positions equals the probability 
that they were moved in the first place), so the uniform distribution remains the unique stationary 
distribution. And if there were no border effects, we would be able to compute -E'[A$], since when 
a site in a lattice path is selected by the Markov chain, the expected change in the total height 
function is determined by that point and its immediate neighbors on the same path. For hexagonal 
regions the borders cannot obstruct the movements of the lattice paths, and we are able to get both 
upper and lower bounds for these chains. For general regions we still obtain a good upper bound 
bound on the mixing time despite these "border effects" . 

The constraints that the borders of the region impose are that certain locations of certain lattice 
paths have maximal or minimal values. For instance, at the start and end of a lattice path, the 
maximal and minimal values are identical. 

Let w be the width of the lozenge-tiling region in the lattice path representation. That is, w is 
the distance between the leftmost start of a path and the rightmost end of a path. (For the path 
Markov chain, w was n.) As in the single path Markov chain, we find it convenient to center the 
region about the origin, so that the x-coordinates of the points on the paths range from —wjl to 
wjl. For a given set of nonintersecting lattice paths (a.k.a. routing), let hi{x) be the height of 
the zth path in the routing at the given j;-co6rdinate. If the path includes x and x — 1, we have 
hi{x) = /ij(x— l)±l/2, so that pushing the path up or down changes its height by 1. For convenience, 
extend the definition of hi to locations x where the ith path does not have a point, say by letting 
hi(x) take on its maximum possible value consistent with the constraint hi{x) = hi{x — 1) i: 1/2. 



14 



DAVID B. WILSON 




Figure 5. A tower move for the Luby-Randall-Sinclair Markov chain for lozenge 
tihngs. If the chain attempts to push up the third path from the bottom in the gray 
area, then it and the two paths blocking it are all three pushed up together with 
probability 1/3. 

The "displacement" function of h that we will use is 

^(^) = E E h{x)co^^, (8) 

i x=-w/2 

where < /? < vr. This function $ is the natural generalization of our earlier lattice path displace- 
ment function, but in the context of lozenge tilings we put "displacement" in quotes because it 
does not measure displacement from anything in particular. 

Suppose that the Markov chain picks a particular location x on the ith path to try randomly 
pushing up or down. Let h[{x) be the updated value of the gap at that location. If the ith path 
does not have an extremum at x, then h'^{x) = E[h^{x)\ = hi{x) = [hi{x — 1) + hi{x + l)]/2, so 

A ^ M \hi{x — 1) + hAx + 1) , , ,1 /3x 

^A$/i = ^ '—^ --hi{x) cos—. 9 

z w 



Suppose instead that the ith path does have an extremum at x. In the absence of border constraints 
or interactions with other paths, the extremum is pushed up or down with probability 1/2, so that 
we would have = — 1) + /ij(x + 1)]/2. Next we take into account interactions amongst 

the paths, but not border effects yet. Suppose the Markov chain attempts to push the given local 
extremum in the opposite direction that it is pointing, but that there are k paths blocking its 
movement. Because we are using nonlocal moves, with probability 1/(A; + 1) each of these k + 1 
paths is moved at location x, with each affecting by cos{f3x/w). Thus Equation (^) continues 
to hold true. 

The only reason that (^) might fail is if path i has a local maximum at x and pushing it down 
violates the border constraints (in which case = in (^) becomes >), or the path has a local minimum 
at X and pushing it up violates the border constraints (where = in (^) becomes <). For some regions, 
such as the hexagon, all the paths start at —w/2 and end at w/2, and the only border effects are 
that the endpoints of the paths stay fixed. For such regions Equation (P) always holds for each 
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path and each x such that —w/2 < x < w/2, and it holds for x = zizw/2 as well when (3 = tt. Using 
this equality we will derive a lower bound on the mixing time when the region is a hexagon. For 
general regions it is difficult to obtain a lower bound on the mixing time, but we still obtain an 
upper bound. 

5.3. Mixing time upper bound. To get the upper bound we will work with a pair of routings 
with heights hi and /ij, such that each path in the ffi'st routing lies above the corresponding path 
in the second routing. How we extended the definition of hi and hi to locations x where path i 
does not exist was a bit arbitrary, but we did it the same way in both routings so that at these 
locations hi{x) — hi{x) = 0. Then the gap function between the two routings is g = h — h, and the 
gap is ^{g) = ^{h) — ^{h), which is zero when the routings are the same, and positive otherwise. 

When we pick the same location x on the same path in both routings, and randomly push in the 
same direction, then from Equation (0) we get 



E[A^{g)] = E[A^{h) - A^{h)] 



gijx - 1) + gijx + 1) _ ■ 



cos — (10) 

w 



unless the borders influence E[A^{g)]. Suppose that the borders influence E[A^{h)] to be larger 
than the value given by Equation (|9|). Then hi{x) takes on its minimal possible value, and is a 
local maximum, so hi{x it 1) also take on their minimum possible values. But hi dominates hi, so 
hi{x) and hi{x it 1) also assume these same minimum possible values. Thus the right-hand side of 
( [T0| ) is zero. Since hi{x) and hi{x) are immobile, the left-hand side is zero as well, so ( [10| ) continues 
to be true. Similarly, if the borders influence E[A^{h)] to be smaller than the value given by 
Equation (P), Equation (|lO|) continues to hold true. If on the other hand, the borders influence 
-E[A$(/i)] to be smaller than the value specifled by Equation (|9|), and/or influence i?[A#(/i)] to be 
larger than that specifled by (P), then we may replace the second equality in Equation (^) with 
< to obtain a true statement. 

Let p denote the number of internal points on the paths in the routing, i.e. the number of places 
where the Markov chain might try to push a path up or down. (The p in this section was n — 1 in 
the section on the path Markov chain.) Then when we stop conditioning on a particular site of a 
certain path getting pushed, we use the same derivation used in the proof of Lemma ^ to conclude 
that 

E[A^] < , 

P 

with equality when /? = vr, all paths start at —w/2 and end at w/2, and the only restrictions on 
the locations of the paths are that that their endpoints are pinned down and they do not intersect. 
Then we use the same argument used in the proof of Theorem ^ to find that the Markov chain is 
within e of uniformity after 

2 + o(l) 2i ^ 
-pw log ■ 



/92 " ° ^mine ' 

where m is the number of (local) moves separating the upper and lower configurations, and <I>min > 
cos(/3/2) (vr - /3)/2. Taking f3 = n - 9(1/ log n) as before yields 

Theorem 7. For the Luby-Randall-Sinclair lozenge-tiling Markov chain on a region which has 
width w, has m (local) moves separating the top and bottom configurations, and contains p places 
where lattice path may be moved, after 

2 + o(l) 2i m 

K pw log — 

vr^ e 

steps, coalescence will occur except with probability e. (For regions which are not simply connected, 
we mean coalescence within a given connected component of the state space.) 
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For the bound stated in the introduction we used p < n and m < Ir?!'^ ( |Luby, Randall, and 
Sinclair, 1995| ). 



For the hexagon of order I, p = 2£{£ — 1), w = 2£, and m = so our mixing time bound is 
5.4. Lower bound for the hexagon. 

Theorem 8. For the regular hexagon with side length i, the mixing time of the Luby- Randall- 
Sinclair Markov chain is at least (S/vr^ — o(l))^'^ log £. 

Proof. We apply Lemma ^ here to lower bound the mixing time of the lozenge tiling Markov chain 
proposed by Luby, Randall, and Sinclair, when the region is a regular hexagon with side lengths 
£. Our potential function has the required contraction property, with 7 13"^ / {2pw'^) « 7r^/(16£^). 
^max = ^^/2. Since we are using nonlocal moves, A$ can be as large as £. Suppose that when 
the Markov chain picks a site on a path and tries to push it, there are k paths in the way. With 
probability 1/(A; + 1) A<I> = {k + 1) cos(a), and otherwise A<I> = 0. Conditioning on this site being 
selected, ii^[(A$)'^] < /c + 1, and in general £'[(A$)^] < R = £. Applying Lemma ^, we obtain a 
mixing time lower bound of 



(16 + 0(1)^4 



31og^ + llog(e/^5) 



^.^log.. 



5.5. The other Luby-Randall-Sinclair Markov chains. In addition to the Markov chain for 
lozenge tilings, Luby, Randall, and Sinclair introduced Markov chains for domino tilings and Euler- 
ian orientations. Randall (1998| ) has pointed out that our analysis in § is readily adapted more 
or less unchanged to their domino tiling Markov chains. It is much less obvious how to adapt the 
analysis to the Eulerian orientation chains. The reason for this difference is that for the Eulerian 
orientation Markov chains, the nonlocal "tower moves" overlap one another in a criss-cross fashion, 
whereas in the lozenge tiling and domino tiling chains the towers are parallel to one another. We 
effectively gave each local move within a given tower the same weight, and if we do the same for the 
Eulerian orientation chain, we get the trivial weighting, which does not have the desired contraction 
property. 

5.6. The local move Markov chains. For a "normal" i x i region one might expect that for 
typical configurations, the towers in the nonlocal tower moves are fairly short, which suggests that 
while the Luby-Randall-Sinclair Markov chain is much nicer to analyze rigorously, it may not be 
much faster on these regions than the local moves Markov chain. (However for certain contrived 
regions, such as a pencil-shaped region consisting of one long tower, the LRS Markov chain will 
be much faster than the local- moves Markov chain.) Thus we have a heuristic prediction that the 
local moves chain takes Q{i^log£) time to mix in "normal" £ x £ regions. ICohn (1995| ) tested this 



prediction by doing coupling time experiments, and reported that it was "about right." Then penley| 
(1997) did some detailed heuristic calculations and predicted that the relaxation time (reciprocal 
of the spectral gap) was Q{£'^) for a variety of models that have associated height functions. In 



an interesting recent development, Destainville (200"l| ) experimented with the local moves chain for 
rhombus tilings of octagonal regions where there are 6 = (2) types of rhombuses, and concluded 
that the 0(^^log^) estimate holds for these tilings as well. 

From a rigorous standpoint, Randall and Tetali (200"o|) established a polynomial time upper 



bound on the mixing time of the local moves chain. Their approach was to use the mixing time 
bound from Theorem |^ on the nonlocal moves Markov chain to obtain a bound on the chain's 
spectral gap, use techniques developed by Diaconis and Saloff-Coste (1993b|) to compare the spectral 



gaps of the local and nonlocal Markov chains, and then derive a mixing time bound for the local 
moves chain from its spectral gap. Their local moves mixing time bound was 0{n'^w'^h'^ log n), 
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where n is the area, w is the width, and h is the height, or in other words 0{£^\ogi) for I >i i 
regions. If rather than starting from our mixing time bound on the nonlocal Markov chain, one 
instead starts from our bound on the spectral gap (which was not explicitly given in an earlier 
version of this article), then the log- factors disappear from the mixing time bound of the local 
moves Markov chain. 



6. The Karzanov-Khachiyan Markov chain 

As mentioned in the introduction, random generation of linear extensions can be used to approx- 
imately count the number of linear extensions of a partially ordered set, which is a #P-complete 
problem ( [Brightwell and Winkler, 1991] ). pyer. Frieze, and Kannan" (19911 ) showed one can gen- 



erate an (approximately) random linear extension of a partial order in polynomial time, using a 
Markov chain on a certain polytope. Matthews (1991| ) gave a different geometric Markov chain 



for random linear extensions that runs in time 0{hbQn^\o^ nlogl/e). Karzanov and Khachiyan 
(19911) gave a combinatorial Markov chain for linear extensions, and showed that it mixes in time 
8n^ log(|il|/e) < O(n^logn), where |0| is the number of linear extensions. Dyer and Frieze (1991) 



improved the mixing time bound to 0(n^ log(|0|/e)) < 0(n^ log n). [Felsner and Wernisch (1997 



showed that the Karzanov-Khachiyan Markov chain mixes in time 0{n'^ logn) for a certain class of 



partial orders, and that one can obtain an unbiased sample in this time. Bubley and Dyer (199^ ) 
showed that a related Markov chain mixes in time O(n^logn), and that the original Karzanov- 
Khachiyan Markov chain mixes in time 0(n^ log n log(|n|/e)) < O(n'^log^n). We show here that 
the Karzanov-Khachiyan Markov chain mixes in time O(n^logn), and exhibit a partial order for 
which the Karzanov-Khachiyan Markov chain and Bubley and Dyer's variation of it both need 
order v? log n steps before they begin to get close to being random. 

(Despite this progress, it still takes 0{n^ n log(n/e)) time to approximately count linear 
extensions to within a factor 1 + e ( Bubley and Dyer, 1998| ). One can count the linear extensions of 



series-parallel posets much more quickly, so the aforementioned data-mining application ( [Mannila 



and Meek, 20001 ) restricted its attention to these posets.) 

^iHlfi^^n^Tlyer (1998) use their simple yet powerful method of path coupling ( p3ubley and 



Dyer, 199^ ) to bound the mixing time of Markov chains related to the Karzanov-Khachiyan Markov 



chain for random linear extensions. In their generalization, the items at positions i and i + 1 are 
considered with probability f{i), and the Markov chain transposes these items with probability 1/2, 
provided that doing so does not violate the partial order. For the Karzanov-Khachiyan Markov 
chain, f{i) = l/(n — 1) for i = 1, ... ,n — 1. Bubley and Dyer show that if f{i) is given by a 
parabola, f{i) oc i{n — i), then the Markov chain mixes in time (l/3 + o(l))n^ logn, and then argue 
using eigenvalue comparison techniques that the original Karzanov-Khachiyan chain mixes in time 
no larger than 0{n^ log^ n). 

We show here how to generalize Bubley and Dyer's analysis of these Markov chains, and obtain 
an upper bound of (4/7r^ + o(l))n^log?7- for the Karzanov-Khachiyan Markov chain. We remark 
that 4/7r^ is about 22% larger than 1/3, but selecting a uniformly random location is easier than 
selecting one according to a parabolic distribution. We will see in section § ^ that by doing 
updates in "sweeps" rather than at independent uniformly random locations, the required number 
transpositions can be cut in half. So this analysis marginally improves but does not significantly 
impact the time needed to generate random linear extensions. Mainly it serves to illustrate the 
utility of the technique used throughout this article for analyzing the mixing time of a variety of 
Markov chains that have been studied before. 

What we do is simply add weights to the distance function between linear extensions. If positions 
i and j > i are transposed, Bubley and Dyer defined the "width" of the transposition to be j — i. 
We will define the width to be w{i,j) = Yli<k<j ^(^)) where the w{k) > are to be chosen later. 
Given two linear extensions X and Y of the partial order, a transposition sequence was defined to 
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be a sequence of linear extensions X = Zq, Zi, . . . , Zj. =Y such that Z^ and Z^^i differ by a single 
transposition. The weight of a transposition sequence is the sum of the widths of the transpositions, 
and Bubley and Dyer define the distance 5{X, Y) between linear extensions X and Y to to be the 
weight of the minimum weight transposition sequence. 

Bubley and Dyer show in an appendix that (when each the w{i) = 1) 5{X,Y) = 6siX,Y), 
where Ss is Spearman's footrule, which is defined by 5s{X,Y) = {l/'^)Y^^=i — The 

proof is also valid in the weighted scenario, when the definition of 5s is generalized to 5s{X,Y) = 
(1/2) Eti «^(min(X(i), y(i)),max(X(i), y(i))). 

Given two permutations A and B that differ by single transposition and which are updated 
to A' and B' using a coupled Markov chain, Bubley and Dyer prove that 

E[S(A\B')] < ,(A. B) + /(i-lMi-l)-/(iMi)-/(i-lMj-l) + /(iMi) 
= {(A,B)(l-7y) 

where 

_ 1 -f{i - l)w{i - 1) + f{i)w{i) + /(j - l)w{j - 1) - f{3)w{j) 
^''^ 2 w{i) + --- + w{j -I) 

As always, they show this assuming constant weights but the same proof holds for general 
positive weights w. 

Letting 7 = minj.j7jj, their method of path coupling gives an upper bound of (I/7) log(D/e) 
on the number of steps before the variation distance from stationarity is at most e, where D is 
the ratio of the maximum distance to the minimum positive distance. Observe that = c (resp. 
> c) for each i and j if and only if 7j,j+i = c (resp. 7i,j+i > c) for each i. 

Given constant weights the choice of frequencies / that maximizes 7 is given by a parabola. 
Therefore Bubley and Dyer chose f{i) = i{n — i)/K, where the normalizing constant is -ftT = 
(n^ — n)/6. It is easily checked that 'jij = 1/K when j = z + 1, so this holds for all i and j. With 
constant weights one can show D < \ v? /A\ , so their bound on the mixing time is (l/3+o(l))n^ log n. 

Given constant frequencies /, the optimal choice of w is sinusoidal. Let w{i) = cos(/3(i/n — 1/2)) 
with < /J < vr; these weights are positive as required. Since 

— cos(x — (5) + 2 cos(x) — cos(x + 5) 
2 cos(x) 

we have 



1 — cos((5) 



1 - cos(/?/n) /?■ 
> z > 



•2 



n 



1 - 2n3 



for J = i + 1 (we do not have equality when i = \oii = n — 1 since /(O) = f{n) = 0), so this 
bound holds for all i and j. As in the proof of Theorem]^, we take (5 = tt — 0(1/ log n) so that 7 
is large, while not making the ratio D of the maximum distance to the minimum positive distance 
too large. Then the upper bound on the mixing time is 

2n3 D 4 + 0(1) 3, 
_- log — = — n logn . 

Of course when designing a Markov chain, we are free to pick both / and w. Optimizing them 
together would be an interesting challenge. 

7. Sweeps versus independent updates 

So far we have focused on updates where a random site is selected, and then a local randomizing 
operation is performed at that site. Often in practice the various sites are updated in systematic 
"sweeps" rather than at random. For instance, for permutations or linear extensions, rather than 
randomize a random adjacent pair of items, one may instead randomize the items in positions 
(1,2) (3,4) (5,6) and then do positions (2,3) (4,5) (6,7) .... Likewise for lozenge tilings. 
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one may randomize the lattice paths at all places where the x-coordinate is even, then afterwards 
at all places where the x-coordinate is odd. Call the first set of updates an even sweep, and the 
second set an odd sweep. In all cases where we have derived upper bounds on the mixing time of 
a Markov chain (i.e. grand-coupling time for a random path or permutation by random adjacent 
transpositions, grand-coupling time for Luby, Randall, and Sinclair's chain on lozenge tilings, and 
the mixing time for the Karzanov-Khachiyan Markov chain for random linear extensions) the same 
analysis that worked for independent updates at each step also works for sweeps. If one randomly 
chooses between even sweeps and odd sweeps, then we have the same contraction property, with 7 
scaled up by a factor of (n — 1)/2 (or p/2 in the case of lozenge tilings) . The mixing time bounds are 
then roughly the same though slightly better than the bounds we would get when using the same 
total number of transpositions (or path pushes) but at independent uniformly random locations. 
Successive even sweeps are redundant, as are successive odd sweeps, so when we alternate, we 
perform about half as many moves (in continuous time) to get the same value of the total variation 
distance or probability of not coupling. 

Note that we have not proved that the mixing time is actually twice as fast, merely that our 
upper bound on it is half as large. 

Perhaps more important than this factor of two savings is that many fewer random bits are 
required to do the updates, since the locations of the updates are deterministic. For Markov chains 
with simple moves such as these, generating pseudorandom bits can take an appreciable fraction of 
the total running time. Whether for this reason or for simplicity, in practice the algorithms used 
to generate random tilings have typically used systematic sweeps. 



8. Lattice paths and permutations revisited 

We have already given a quick-and-easy analysis of the adjacent-transposition Markov chain on 
lattice paths and on permutations, obtaining upper and lower bounds on the mixing time and 
coupling time that match to within constants. In this section we give a more refined analysis which 
improves these constants. 

8.1. Upper bounds. Consider the Markov chain on permutations which exchanges a random 
adjacent pair with probability 1/2, and the pairwise coupling for which the choice of adjacent pair 
is always the same in the two Markov chains, and the decision of whether or not to exchange is 
also the same in both chains, unless an exchange in one chain but not the other would decrease the 
Hamming distance between the two permutations, in which case the exchange is done in a random 
one of the two permutations but not the other. This coupling was also used by Aldous (1983| ). 



Let us focus on how a given item moves in the two permutations. The state space is the n x n 
grid, representing the location of the item in the two different permutations. A typical state (x, y) 
transits to its four neigboring states {x±l,y) and {x,y±l), each with probability a = l/(2(n — 1)), 
with the following exceptions : ( 1 ) if x = y then the transitions are to (x + l,y + l) and (x — 1 , y — 1 ) , 
each with probability a, and (2) if the transition would be to a pair outside the n x n grid, the 
transition instead self- loops. Observe that the x-coordinate is a simple random walk on a chain of 
length n, and similarly for the y-coordinate. Furthermore, any state (x, y) with x ^ y is transient, 
so that eventually x = y. This will take about 6(n^/a) steps. We will need a good estimate of the 
probability that coalescence has not occured after a large multiple of G(n^/a) steps, so we prove 
the following lemma: 

Lemma 9. After T steps Pr[xT 7^ yr] < 10exp[— T(l — cos{tt / n)) / (n — 1)]. 

Before proving this lemma, we derive from it our bounds on the coupling times. 

Theorem 10. For the above pairwise coupling on permutations, the number of steps before the 
probability of coalescence is at least 1 — e is at most (2/7r^ + o(l))n^ log(10n/e). 
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Proof. After T steps the probability that the two permutations are still different is at most the 
expected number of items that are in different positions in the two permutations, which is at most 
10nexp[— r(l — cos(7r/n))/(n — 1)]. Setting this equal to e gives the desired bound. □ 

For the permutation Markov chain, the + o(l))n^logn upper bound for the variation 

distance threshold and the (4/7r^ + o(l))n^ logn upper bound for the separation distance threshold 
follow immediately. The same bounds for the lattice path Markov chain follow from projecting the 
permutation to the lattice path. 

Theorem 11. For the sort/reverse-sort grand coupling on permutations, the number of steps before 
the probability of coalescence is at least 1 — e is at most (4/7r^ + o(l))n'^ log(10n/e). For lattice paths 
the sort/reverse-sort grand coupling time is at most (2/7r^ + o(l))n'^ log(10n/e). 



Proof. What we have analyzed in Theorem 10 is a pairwise coupling of a Markov chain on permu- 
tations, i.e. an update rule that updates pairs of permutations. This pairwise update rule does not 
extend to a grand coupling, i.e., there is no update rule defined on all permutations such that pairs 
of permutations evolve according to the above pairwise coupling. But let us look at the evolution 
of a threshold function of the two permutations. Normally both permutations are either sorted or 
reverse-sorted at a location, which corresponds to a push-down or push-up move in the threshold 
functions. The exceptional case where one permutation is sorted while the other is reverse-sorted 
occurs when in one permutation items i and j are in adjacent locations x and x + 1, while the 
other permutation has items j and k at these locations, and either i<j<k or k<j<i. Any 
given threshold function will map j to either or 1, and then one of the two permutations will have 
either an up-slope or down-slope at locations x and x -|- 1, and for that permutation an observer 
would be unable to tell whether a sort or reverse-sort operation was performed. Thus from the 
standpoint of an observer, it appears as if the two threshold functions were evolving according to 
the monotone grand coupling considered earlier, and our bound on the pairwise coupling time for 
permutations translates to a bound on the grand coupling time for lattice paths. 

Next we can convert the bound on grand coupling time for lattice paths into an upper bound on 
the grand coupling time for permutations for the straightforward sort /reverse-sort coupling. After 
(4/7r^ + o(l))n^logn steps the pairwise permutation coupling (and hence the lattice path grand 
coupling) has coalesced except with probability <C 1/n, so the permutation grand coupling has 
coalesced except with probability <C 1. □ 

Surprisingly, experiments suggest rather strongly that A/tt'^ is in fact the correct constant, so that 
essentially nothing was lost in converting the coupling time bounds from permutations to lattice 
paths and back to permutations. 

Proof of Lemma |^. Since we are interested in the probability that the random walk has not hit the 
diagonal, and the regions below and above the diagonal behave symmetrically, let us consider the 
state transition matrix M„ for the random walk above the diagonal (x < y), where the random 
walker reflects off the boundaries of the grid, and dies when it hits the diagonal. The matrix 
Mn resembles a stochastic matrix, except that for those rows corresponding to states next to the 
diagonal, the row-sum will be less than one. For the reader's convenience we proceed to diagonalize 
the matrix M„; other triangular regions with different boundary conditions have been similarly 
diagonalized in e.g. ( Kenyon, Propp, and Wilson, 200d|) . 



For < j < A; < n, let the function fj^k{x,y) be defined by 

fj,k{x,y) = cos{jTTX /n) cos^kny /n) — cos{jTTy/n)cos{k7rx/n). 
For convenience let the values of the grid coordinates x and y range from 1/2 to n — 1/2. Since 

fj,k{^ - 1, y) + fj,k{x + 1, y) + fj,k{x, y - 1) + fj,kix, y + 1) - 4/^- ^(x, y) 

= [2cos(j7r/n) + 2 cos(A;7r/n) - 4]/j-,fc(x,y) , 
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fj^k is an eigenvector of the nearest-neighbor random walk on with transition probabilities a, 
and its eigenvalue is 

Xj^k = 1 + a[2 cos(j7r/n) + 2cos(/c7r/n) — 4] . 

Since furthermore fj,k{x,x) = 0, fj,k{x,y) = fj,k{-x,y), fj,k{x,y) = fj^k{x,-y), fj,k{x,y) = 
fj^kC^n — x,y), and fj^k{x,y) = fj^k{x,2n — y), it follows that fj^k is also an eigenvector of M„ 
with eigenvalue Xj^k- 

Next we show that any two of these n{n — l)/2 eigenvectors are orthogonal, and that each is not 
identically zero: 

X] fji,ki(.^^y)fh,k2(.^^y) =X] [cos(ji7rx/n)cos(fci7ry/n)cos(j27rx/n)cos(A;27ry/n) 
i/2<x<y<n-i/2 x<y -|- cos(_7i7ry/n) cos(/ci7ra;/n) cos(j2'7ry/n) cos(/c27rx/n) 

— cos(_7i7ra;/n) cos{kiTry /n) cos(j2'7ry/n) cos{k2TTx/n) 

— cos{jiiry/n) cos{ki'Ex/n) cos(j27ra;/n) cos{k2'Ky / n)\ 
= [cos(ji7rx/n) cos(fci7ry/n) cos(j27rx/n) cos{k2T^y/n) 

^'^ — cos(ji7rx/n) cos{kiTry/n) cos(j27ry/n) cos(A;27rx/n)] 
= cos(ji7rx/n) cos(j27ra;/n) cos{ki7Ty/n) cos(A;27ry/n) 

a; J/ 

— cos(ji7rx/n) cos(A;27ra:;/n) cos(fci7ry/n) cos(j27ry/n) 

X J/ 

_ ^.yi=.;2 + ly 1 = 72=0 ^ ^ lfci=A:, + 1a:i=a-2=0 ^ 
2 2 

2 2 

Since ji < fci and j2 < k2 the second term is zero. The first term is also zero unless both ji = j2 and 
ki = /c2, giving us orthogonality. If {ji,ki) = (j2, k2) = (j, k), then we find that this inner product 
is (1 + lj=Q)n'^/4, so the eigenvectors are nontrivial. Hence, we have an orthogonal eigenbasis of 
the matrix M„. 

Suppose that the random walker starts at {xo,yo). Let 5xQ,yQ{x,y) be the function which is 1 
at the starting location and zero elsewhere. Let J{x,y) denote the function which is 1 whenever 
X < y. We have 

Pr[xT 7^ yr] = {Sxo,yoM^) ' J 

^xo,yo ' fj,k r 71 j-T 1 I \ ^ ■ fj,k 



Kj<k 



fj,k • fj,k ' j \ fj,k • fj,k 



Ei'lnuyo ■ fj.k)(J ■ fj,k) .T 
3<k 

We need a bound on Xj^k to bound this summation, and to this end consider the line passing 

through (0, cos 0) and (t, cos t). When t = 7r/2, the line is at least as high as cos s when t < s < tt. 
If i < 7r/2, the line's slope increases towards 0, so it continues to be above cos s when 7r/2 < s < tt. 
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and by concavity of coss for < s < 7r/2, the line is also above coss when t < s < Tr/2. Taking 
t = Tr/n (assume n > 2 — the lemma is trivial if n = 1) and s = jTr/n, we have 

JTT / Tl 

cos(j7r/77,) < 1 -J — (1 — cos(7r/n)) 

2cos(j7r/n) — 2 < — 2j(l — cos(7r/n)) 
2cos(j7r/n) + 2cos(fc7r/n) — 4 < — 2(j + — cos(7r/n)) 

Aj, fc < 1 - a2{j + k){l - cos(7r/n)) . 

Let c = 2a(l — cos(7r/n)) « a-K^ jv? so that 

Aj,fe < 1 - (i + A;)c < exp[-(j + A;)c] . 

Since a = l/(2(n - 1)), A^- ^ > 1 - 8/[2(n - 1)] > when n > 5, and Aj-,fe > forn = 2, 3, 4 as well, 
so we may take the Tth power of both sides to get 

Aj;fe<exp[-(i + fc)cT] 

~ ^ exp[-(2j + l)cr] 
i-exp(-cT) 

The lemma is trivial unless exp[— cT] < 1/10, in which case 8/[l — exp(— 2cT)]/[l — exp(— cT)] < 
8 • 100/99 • 10/9 = 8000/891 < 10, so that 

Vx\xT + VtX < 10exp[-cT] . □ 

8.2. Lower bounds. 

Theorem 12. For the Markov chain on lattice paths in a n/2 x n/2 box, the time it takes the top 
path and bottom path to coalesce is with high probability at least (1 — o(l))2/7r^n^logn. 

Proof. As an upper lattice path h and lower lattice path h evolve together via the push down / 
push up coupling, let us look at the difference path h = h — h. Ifh goes up and h goes down, which 

we will denote then the difference path h goes up, which we denote with U. If /i goes down 
and h goes up (g), then h goes down (D). In the remaining two cases ([j and p) the difference 
path remains flat (F). We may view the difference path as a string of U, F, and D particles, and 
it is easy to check that the evolution of the difference path is a Markov process: If the particles 
at the updated site are UU=qq, then they remain pQ=UU. If a UD=Qy is updated, the result is 
either [jd=FF or du=FF. If a UF is updated, the underlying paths might be and then change to 
y^=FU or ^y=UF, or the underlying paths might be and then change to dd=UF or ^^=FU. 
Likewise, if a FF is updated, there are four possibilities for the underlying paths, and in each case 
the updated configuration is FF. The other cases (DD, DU, DF, FU, and FD) are similar and related 
to the above cases by symmetry. We may summarize the update rules for the string of U's, D's, 
and F's as follows: pick a random adjacent pair, and with probability 1/2 exchange them; when a 
D and U are exchanged past each other, they both turn into F's. If we start with the top path and 
bottom path, then in the difference path every U will be to the left of every D. 

We will do a number of comparisons between a random permutation a and the difference lattice 
path h. For the kth. comparison (0 < A; < n), look at the locations of cards 1,. . . ,k, and in particular 
their relative order. Let tu(1) denote the first of these cards encountered in a left-to-right scan, 
and in general ru(i), 1 < i < A;, denotes the ith such card encountered. Label the first k U's of the 
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difference path with the numbers ru(l), • • • ,T[j{k). Similarly let TD{i), 1 < i < k, denote the ith 
card from the cards n + 1 — A;, . . . , n to be encountered in a right-to-left scan of the permutation, 
and label the ith to last D of the difference path with TD{i). We leave the remaining particles of 
the difference path unlabeled. When we evolve the difference path via random exchanges, we will 
let labeled particles be exchanged past each other, but a labeled U or D may not be exchanged 
past an unlabeled U or D. This rule for the labels does not affect the evolution of the unlabeled 
difference path, but it is important for our understanding of it. 

Initially for each 1 <i < k, the position of card T[j{i) in the difference path is weakly to the left 
of card ru(i) in the permutation, while the position of card To{i) in the difference path is weakly 
to the right of card TD{i) in the permutation. We will pick the same random adjacent pair in the 
permutation as in the labeled difference path, and make the same decision as to whether or not to 
exchange the adjacent items. Consider the first time that the above invariant fails to hold, say that 
card T\j (i) in the labeled difference path moves to the right of card t\j (i) in the permutation. On the 
previous step, card ru(^) was in the same location in the difference path and the permutation. The 
exchange could not have been to the right of the card Tu{i), because exchanges in the permutation 
always succeed, nor could the exchange be to the left of card Tu(i), as any particle to the left of card 
Tu(i) is either an F or a labeled U, and such exchanges succeed. Thus the invariant is maintained. 

Consider the locations of two cards i and j in a random permuation a, or two labels in the 
difference path h. Let the weighted gap between them be defined by wgap(i,j) = sin(7ra;/n), 
where the sum is taken over positions x between the two cards, and is negative if card j occurs 
before card i. Within a random permutation a we have 



8 

The area under the difference path is the sum of the locations of the D particles minus the sum 
of the locations of the U particles. The potential function (the weighted area) is 

n 

= ^ h{x) sin — 
x=o ^ 

maxx h{x) 

= XI wgapft(ith U,ith D) 

i=l 

k 

> Xwgap^(i,n-|-l-i) 

i=l 

if A; < maxj; h{x). As wgap^(i,n + ! — «)> wgsip^{i,n + 1 — i) and also wgap^(i,n -|- 1 — i) > 
whenever i < maxj, h{x), we have 

k 

^(h) > X max(0, wgap^(i, n + 1 — i)) 

i=l 

for any k < max^, h{x), so 

n/2 

^{h) > X li<max^ h{x) max(0, wgapo.(i, n + 1 - i)) . 

i=l 



E[\wgap^{i,j)\] 



2n / u{l — u) sm{Tru)du = 
Jo 
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Since we started with a random permutation and the dynamics are reversible, then even conditional 
upon all the past moves, the permutation is still uniformly random. In particular wgapf^{i,n + l — i) 
is independent of the maximum height max^. h{x) of the difference path, so that 

n/2 

E\^(h)] > > Pr[i < max h{x)]E[max{0, wgap(i, n + 1 — i))] 

i=l 

E[^{h)] > E[maxh{x)][{8/n^ + o(l))n/2] 

X 

E[maxh{x)] < (1 + o{l))—^{ho){l - 7)* 
X 4n 

and since $(/io) = (2/7r^)n^ 

E[maxh{x)] < (1 + o(l))-n(l - 7)* . 
X 2 

Note that this gives another proof that coalescence is likely after t = (2/7r^ + o(l))n^ logre steps. 

Notice that the difference path never changes by more than one at a time, and only if a U or 
D particle moves. There are 2maxa, h{x) U and D particles, each particle can move in one of two 
directions, and a given proposed exchange occurs with probability 1/2. Thus 

E[A<P^\h{-)] < 2max/i(x)/(n - 1) . 



E[^\ht)\ht-i] = (1 - 2j)^\ht-i) + E[A^^\ht.i{-)] 

E[^''{ht)] < (1 - 2j)E[^^{ht-i)] + E[E[A^^\ht-i{-)]] 
E[^^{ht)] < (1 - 2j)E[^\ht.i)] + (vr + o(l))(l - 7)*-i 

where here the o(l) term depends only on n. By induction 

E[^\ht)] < (1 - 2jY<^>\ho) + (vr + o(l))(l - 7)V7 • 

Subtracting E[<^{ht)]'^ = (1 - j)^^^^{ho), 

YaiMht)] < (^ + o(l))(l-7)V7 
Var[$(/it)] < {2/7r + o{l))n^{l - 7)* 
Var[$(/it)] < {n + o{l))n^{ht) . 

Thus if ^ '^n, w.h.p. ^{ht) > 0, so that the time until coalescence is w.h.p. at least (1 — 
o(l))2/7r2n^logn. □ 

9. Exclusion and interchange processes 
In this section we show how to apply Lemma ^ to lower bound the convergence rate of exclusion 



and interchange processes. Several of these Markov chains where studied by Diaconis and Saloff- 



Coste (1993a| ), who derived upper bounds on their mixing times but did not have lower bounds 



that matched to within constant factors. The lower bounds derived here match (most of) their 
upper bounds to within constant factors. 

The interchange process describes particles moving around on an undirected (but possibly 
weighted) graph. At each time step, a random edge of the graph is selected, and the particles 
at either endpoint of that edge are exchanged. The particles could be O's and I's, or they could, 
for instance, be distinct numbers from 1 up to the number n of vertices. 
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Lee and Yau (1998) studied the logarithmic-Sobolev constants and the L2-mixing times of some 
exclusion and exchange processes. The exclusion process may also be viewed as the infinite- 
temperature limit of Kawasaki dynamics for the Ising model, see e.g. (Cancrini and Martinelli 
2OOOD or ( |Lu and Yau, 1993| ). 



Let Qx,y denote the probability that the Markov chain exchanges the particles at locations x and 
y, and for convenience let Qx,x = 1 — "^y^xQ^^v Then Qx,y is the state-transition matrix for the 
location of a particular particle. Let u be a right-eigenvector of matrix Q (the matrix is symmetric, 
so the left and right eigenvectors are the same) with eigenvalue 1 — 7. The lemma requires 7 > 0, 
but in general one would expect that eigenvectors with smaller 7's will give better lower bounds. 

For convenience let us assume for the moment that all the particles are distinguishable, so that 
the state of the Markov chain is a permutation a of size n. Define 

k 
j=l 

and <I>max = maxo- ^{(t). After one step of the Markov chain we have 



E[^{a')\a] = Y,EK'{^\^] 

1=1 

k 



i=i y 

k 



1=1 



(l-7)<f(a) 



so that this ^ has the contraction property required by Lemma ^ (assuming < 7 < 2 - ^/2). In 
effect we used an eigenvector of the graph to define an eigenvector of the interchange process. For 
further information comparing the eigenvalues of the graph with those of the interchange process, 
see e.g. Handjani and Jungreis (1996| ). 



With = ^{cr') — $(0") denoting the change in ^> that occurs after one step of the Markov 
chain, let R be an upper bound on the largest value that S[(A$)^] can take. In general we can 
take 

R < max {vx — Vy)^ , 

^: y- Qx,y > 

but in some cases we might find a better bound. 

In general it need not be the case that all the particles are distinguishable. If there is a set A of 
k particles that are distinguishable from the remaining n — k particles, then we can still define 

$ (state) = Vx ■ 

a;:particle of type A at location x 

We arbitrarily label the A particles 1, . . . , A; and the remaining particles k + \, . . . ,n. The evolution 
of $ is exactly the same as it was when all the particles were distinguishable. 

One may add self-loops to the Markov chain to avoid periodicity problems — say the probability 
of a nontrivial transition is a. Typical choices for a are a = l/2ora^0 (continuous time). 

9.1. Shuffling cards on a hypercube. Here we lower bound the mixing time of the Markov 
chain considered by Diaconis and Saloff-Coste that shuffles cards via random transpositions where 
each transposition is an edge of the hypercube. The underlying graph from which we need an 

eigenvector is the hypercube Z2, and the state space of the Markov chain is or (gd-i) depending 
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on whether we want to shuffle distinct particles or 2^^^ O's and 2^^^^ I's (the same lower bound 
applies to both cases). (Here VI denotes the set of permutations of a set V, and (^) denotes the 
set of subsets of V containing n items.) 

We can take our eigenvector to be the function that is 1 on those vertices of the hypercube 
whose first coordinate is 0, and —1 on the other vertices. When we follow a particular particle, the 
probability that the particle gets moved across the first coordinate is a/{d2'^~^), so our function is 
an eigenvector for which 7 = a/{d2'^~'^). Here <l>max = 2*^"^, and our bound on R is 4a. Substituting 
into Lemma ^, we obtain a lower bound of 



(1-0(1)) 



d2''~^\, 1. e ^ log2.d 



a 



log2'^-^ + -log — ^ 
^ 2 ^ 16d2'^- 



(l-o(l))^<i^2" 



for bounded values of e. It appears that this bound is correct up to constant factors. 

9.2. High-dimensional product graphs. For a fixed connected graph G, consider the nearest 
neighbor random walk on C^. We can imagine that there is a particle in each of d disjoint copies 
of the graph G, where the particle in the ith copy of G gives the it\i coordinate of the walker. 
At each time step a particle chosen from a random copy of G makes a move. For example, the 
Ehrenfest urn model from statistical mechanics is essentially random walk on Zg, so here G consists 
of two vertices and an edge. A common choice for the factor a by which to slow down the walk is 
a = d/{d+ 1), in addition to the usual a = 1/2 and a — > 0. 

The mixing threshold for random walk on G'^ has already been determined, so it is an instructive 
exercise to check that Lemma ^ gives a sharp lower bound in this case. For ^ iaconis andl 



Shahshahani (1987 ) showed that there is a sharp variation mixing threshold at (1/4 it o[l))d\o^d 



steps. Aldous and Fill (200X , Chapt. 7, sect. 1.7) state that the same approach works for G for 



more general graphs G. In continuous time, Diaconis and Saloff-Coste (1996 , Theorem 2.9) prove 
an upper bound on the variation mixing time for random walk on G'^. The lower bound that we 
get from Lemma ^ differs from the upper bound by a factor of 1 — o(l). 

Aidous and Uiacoms (1987, sect. 7) determined the mixing time threshold of a related random 
on G'^ where at each time step, the particles in each copy of G get moved; we note that the lower 
bound from Lemma |5| is tight in this case as well. 

To lower bound the mixing time of random walk on C^, the underlying graph for which we need 



an eigenvector \s G + ■ ■ ■ + G. Suppose that we have an eigenvector v oi G with eigenvalue 1 — 70, 
where 70 is the spectral gap. We take our eigenvector to be the canonical extension of i.e. the 
function that assigns to each vertex x oi G + ■ ■ ■ + G the value of the eigenvector v in the copy of 
G in which x resides. The probability that a particular particle gets moved at a time step is a/d, 
so our function is an eigenvector for which 7 = a^yo/d. (For the walks considered in ( [Aldous and] 
Diaconis, 198^ ) we have 7 = 070-) Here <I*max = Q{d), and our bound on R is 0(a). (For the walks 



considered in ( [Aldous and Diaconis, 1987 ) the bound on R is Q{ad).) Substituting into Lemma |^, 



we obtain a lower bound of 

iogQ('^) + |iog^ = n _ rn^^^ nn 

log 1/(1 -a7o/d) ^ 2ajo ^ ' 

when £ does not go to zero too quickly. 

Remark: This application of Lemma |5| is the natural generalization of the approach that Diaconis 



and Shahshahani (1987)) used to get their lower bound on the mixing time of Zg. For the walk on 



the eigenvalues of Z2 are 1 and —1, so 70 = 2, which upon substition into lower bound (11) 



^2' 

gives the familiar jdlogd. For Z2 the eigenvector used above just counts the difference between 



the number of ones and the number of zeros, which is the same test function that Diaconis and 



Shahshahani (1987 ) used. 
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9.3. Shuffling cards on a grid. Here the cards (or O's and I's) are arranged on an ^ x m grid. 

The state space is {[P\ x [m])\ or (j^'^/^j)) and the graph for which we need an eigenvalue is the 
i X m grid. (Here [n] denotes {1, 2, . . . , n}, and we identify the vertices of the grid with [i] x [m].) 
When i = m, Diaconis and Saloff-Coste show that order (£m)^ log(£m) steps suffice for the Markov 
chain to equilibrate, and conjecture that this is the correct order of magnitude (but see § |9.4| ). 

Our asymptotic results will be valid as max(£, m) gets large, for convenience suppose that I > m. 
Suppose the vertices are labeled as for 1 < i < i and 1 < j < m. We can take our eigenvector 
to be the function that is cos(7r(i — l/2)/i) at vertex (i, j). There are E = i{m — 1) + m{£ — 1) 
edges of this grid graph. One can verify that this function is an eigenvector with eigenvalue 

1-7 = 1- 2a/ E + {a/E)2 cos(7r/£) , 

so 7 ~ avr /{Ei^). The bound for R is order a(l/^)^ and ^> 

max is order im. Applying Lemma |^, 

we obtain a mixing time lower bound of 
El^ 



:i-o(i)) 



logG(£m) + ^log@{e/E) 



,^,, ^^(£-l/2)(m-l/2) , 
(1 - o(l)) 5 log(^m) 



when e does not get too small too quickly as £m gets large. 

Remark: We can recover our mixing time lower bound for shuffling cards by adjacent transpositions 
by substituting m = 1 and a = 1/2 into our lower bound for the i x m grid. 

9.4. Diaconis and Saloff-Coste's grid shuffling process. The actual Markov chain that Di- 
aconis and Saloff-Coste considered had transposition probabilities that were slightly higher along 
edges that touched the border of the grid. This was because their update rule was to pick a random 
vertex, and exchange the particles along a random edge incident to that vertex. Thus each edge 
{u,v) is selected with probability proportional to l/d{u) + l/d{v), where d{u) and d{v) are the 
degrees of its endpoints. In some sense it is clear that the slight non-uniformity in the probability 
with which we select edges can not affect the mixing time too much, but in the introduction we 
promised a rigorous lower bound for Diaconis and Saloff-Coste's Markov chain, so we shall supply 
one. Finding an explicit eigenvector given these boundary conditions seems somewhat painful, as 
does approximating one with sufficient accuracy, so we take a different approach. We show how to 
obtain mixing time lower bounds using only an approximate contraction property. 

Rather than try to approximate an eigenvector, it is easier to make use of the fact that we 
have an exact eigenvector <I> to an approximate state transition matrix, namely the state transition 
matrix considered in § |9.3| and the eigenvector ^ that we used there 



$(5) = J2 cos(^(i - l/2)/£) . 

state S has particle at 

(For convenience we slow down the chain in § by a factor of a = E/{2£m) = 1 — l/(2£) — 
l/(2m) when doing the comparisons, so that the probability of a transition occuring on a given 
edge is simply l/{2£m).) After one update of the approximate state transition matrix we have 
i?[$(S")|S'] = (1 — 7)<I>(S') with 7 as given above. But with the actual state transition matrix, there 
is a 0{{i + m)/{im)) chance that a vertex on the boundary will be selected, causing the exchange 
process to do something different than the approximate exchange process. To bound 

\EMs')\s]-{i-^)HS)\ 

let us focus on a single particle at a time and then make use linearity of expectations and the 
triangle inequality. If the particle is near the boundary, the corresponding difference is at most 
0{l/mi^); if the particle is not near the boundary then the corresponding difference is 0. Since 
there are 0{£ + m) particles near the boundary, with 5 = 0(1/ £^ + l/(m£^)) we have 

EmS')\S] = {l-j)<l>{S)±6 . 
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By induction 



EmSt)\So] = il-jY^So)± - . 

7 



Likewise 



= (1 - 2j)<!>\St) ± 2^St)6 + E[{A^f\St[ 



2A2 

E[<^\St+i)] = (1 - 2j)E[<^\St)] ± 2ci>(5o)<5(l - 7)* ± — + E[iA<^>f 

7 



and so by induction 

Ei^^iS,)] < (1 - 2,)*^(^o) + ^^(1 - 7)* + --^^^^M^ 



^2^ 52 

7 27 7^ 

Subtracting off E[^{St)]'^ we get 



7 27 7^ 



wliere in the last step we liave assumed as in Lemma |5| tfiat 1 — 27 is not excessively negative, i.e. 
that 0<7<2-^/2 = 0.58 (or else < 7 < 1 and t is odd). 

For convenience let us follow Diaconis and Saloff-Coste in assuming £ = m, so that 7 = (7rV2 + 
o{l))/i^ and S = 0{l/e^). We also have ^>(5o) = Q{f) when all the particles start on one side, 
and maxi?[(A<I>)^] = 0(1/^^). Let X be a suitably large parameter to be selected in a moment. 
Pick t = log(£/K)/7, so that E[<^>{St)] = e{K£), and Var[$(5t)] = e{K£^). But in stationarity 
E[^{S)] = 0{£) (indeed it is 0) and Var[<I>(S')] = G(^^). Thus for any given e we can take K large 
enough so that, when we start the exclusion process with all the particles on one side and run it 
for log(^/-ftr(e))/7 steps, with probability 1 — e we are able to distinguish the configuration from a 
state drawn from stationarity. This gives us the desired mixing time lower bound of 

l^ = {2/7T^ + o{l))£Hog£ . 
7 

10. Heuristic arguments for the true constants 

Up until now we have given upper bounds and lower bounds for various mixing times and 
coupling times, and these bounds have typically differed by small constant factors. In this section we 
give heuristic arguments and summarize experimental results for determining the true asymptotic 
constant factors that were given in Table |^. Readers concerned primarily with rigorous arguments 
will find in this section a few theorems and many open problems. 

10.1. A million shuffles or seven. It is well-known that for any Markov chain, when one considers 
the distance from stationarity of the distribution at time t, the variation distance decays as d{t) = 
(1 + o(l))j4rf| A|*, and similarly the separation distance decays as s{t) = (1 + o(l))74s| A|*, where A 
is the second largest eigenvalue (in absolute value). For the Markov chains considered here, we 
have rigorous exact values for A. To paraphrase piaconis (1996| ), the goal of finding mixing times 
is not to determine precisely how far from stationarity a deck of cards is after a million shuffles, 
but to determine if seven shuffles are enough. For many Markov chains, the variation distance from 
uniformity stays close to 1 for a time, and then rapidly becomes small and decays exponentially fast 
(see piaconis (1988D ). The seven-shuffle question, which is more relevant to practical applications. 



asks where this cutoff occurs. The million-shuffle question has the virtue of typically being easier to 
answer, and it appears to be relevant to the seven-shuffle question. Diaconis (1996| ) himself explains 



that the long-term behavior of the Markov chain can be used as a heuristic for predicting which 
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Markov chains will exhibit the "cutoff phenomenon" in the time it takes to randomize. Specifically 
for reversible Markov chains he uses the L2 bound 

i=l 

where the ViS are an orthogonal eigenbasis, and uses the lead term for large t 

i:\Xi\=X 

to make the prediction: if is large then the Markov chain probability exhibits a sharp transition. 



This extends an earlier heuristic that Aldous and Diaconis (1987, sect. 7) gave for predicting the 



order of magnitude of the mixing time cutoff for random walk on groups. 

In this section we will work more directly with the separation and variation distances rather than 
use the L2 norm, and hypothesize that 

« min (1, Ad|A|*) and s(i) min (l, ^| A|*) . 

A priori it is not clear why the variation distance should be well approximated by Af;|A|* whenever 
this approximation is not obviously bad, i.e. when it is not larger than 1. Indeed, there are examples 
where this type of approximation fails: for walks on random Cayley graphs on Zg, the variation 
distance and the L2 norm bound have essentially the same lead term behavior, but exhibit sharp 



transitions at different times (Wilson, 1997b). Nonetheless the above approximation is valid for 
many Markov chains (see e.g. Diaconis (19881 )), and numerical computations described in § 10.7 



give every indication that this approximation holds for the Markov chains that we are interested 
in. We therefore take a moment to formalize this observation: 

Definition 1. A family of Markov chains exhibits a clean (variation) cutoff if for every e there is 
a K so that for any Markov chain in the family and for any time t, whenever \ log(A(jA*)| > K , 

I logd(t) - min(0,log(AdA*))| < e . 

A clean separation cutoff is defined similarly. If a family of Markov chains exhibits a clean 
cutoff and ^ ^ 00 then it will necessarily exhibit a sharp cutoff (mixing time threshold) at 
log ^/ log(l/|A|). Since we already have the second largest eigenvalue for several classes of Markov 
chains, our goal in this section is to compute Ag and Ad-, and report on experiments that suggest 
rather strongly that that the Markov chains we are considering do in fact exhibit clean cutoffs. 

10.2. Preliminaries. To obtain our conjectured values for the true constant factors in the mixing 
times of the adjacent transposition Markov chain on permutations and on lattice paths and the 
Luby-Randall-Sinclair chain on lozenge tilings of a hexagon, we will compute As and approximate 
Ad for these Markov chains. Before working on these specific chains, for the reader's convenience 
we start with some basic preliminaries that are common to all these examples. 

For the Markov chains considered here, we have a "potential function" $ (defined in (|l|), (|8|), 
and (H)) such that E[(^{Xt+i)\Xt] = X(^{Xt). If we view $ as the vector which is <I>(s) in its sth 
coordinate (where s is a state of the chain) , then $ is an eigenvector of the state transition matrix 
with eigenvalue A. Since the Markov chains are monotone, and $ is monotone increasing with 
respect to this partial order, we know that A is the second largest eigenvalue in absolute value. We 
do not a priori know the multiplicity of the eigenvalue A. 

Since the Markov chains considered here are reversible, their state transition matrices are di- 
agonalizable, and there is an orthogonal eigenbasis Vi with eigenvalues Aj. If the Markov chain is 
started in state s, the distribution at time t is "Yli cti^iVi where Oi = Vi{s)/{vi ■ vi). 

To determine the quantity d{t) we need the worst starting state, for d{t) we need the worst pair of 
starting states, and for the separation distance s{t) we need the worst start and destination states. It 
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seems intuitive that for all three measures the worst states must be the top state 1 and bottom state 
0, though there are some monotone Markov chains for which this intuition is wrong ( |Haggstrom 



2001). Let us consider first the Markov chain on the symmetric group. Since it is vertex-transitive, 



all starting states are equivalent, so and 1 are worst starting states. Furthermore, if the chain 



starts in then the worst destination state is 1; Fill (1998|) used this fact in the monotone version 



of Fill's algorithm. For the other Markov chains, we can only argue that and 1 are the right states 
to look at for sufficiently large time t, and then only under the (apparently correct) assumption 
that the second largest eigenvalue A has multiplicity one. While not completely satisfying, this will 
allow us to compute e.g. Ag under this one assumption. Assuming that A has multiplicity one, 
from the above formula we see that when t is big enough, the worst starting states are the states 
maximizing i.e. states and 1, the worst pair of starting states are those that maximize 

|<I>(x) — ^{y)\, i.e. and 1, and the worst start and destination states are those that maximize 
i.e. 6 and i. 

Let us suppose that the eigenvalue A has multiplicity one (the multiplicity is larger for the 
permutation chain, but we will deal with that later). Recall that denotes the distribution at 
time t when the chain starts in state x, and that U denotes the uniform (stationary) distribution. 

For large times i, P| - C/ = $^A*$. If we view ^> as the random variable obtained by picking 
a uniformly random state X and returning ^(X), then $ • $ = Var[<I>]A^ where N is the number 
of states. The vector ^> takes on its most negative value at 0, so it contributes —^{0)N to the 
separation distance. Hence for large times t we expect the separation distance to be — ^^rtir ' 
so that 

^ mm ^ ...^ 

Var[$] Var[$] ' ^ ' 

where the question mark above the equal sign reminds us that in its derivation we used the as- 
sumption that the second largest eigenvalue has multiplicity one. 

The contribution of ^ to the variation distance is ^A^S[|<I>|]. Typically it is hard to get an analytic 
expression for £'[|^>|], but heuristically it is plausible that the distribution of $ is Gaussian, so that 
E[\<!>\] « /~ xe-^'/2 dx v'Var[$] = y^V^Va^M- There are interesting Markov chains, such 
as random transpositions on permutations which was analyzed by piaconis and Shahshahani (1981 ), 



for which the principal eigenvector evaluated at a random state is very far from being approximated 
by a normal distribution. Thus in principle this approximation should be proved for the chains that 
we are considering, but we will simply assert that it is intuitively obvious that this approximate 
normality holds for these chains. Assuming this approximate normality, we have 

^{i)E[\^\]/2 1 ^>(i 



Var[$] ^27rVar[$] ' ^ ' 

7 

(where we used the approximate normality assumption in the ~ relation, and the multiplicity-one 

assumption in the two = relations). This relation between and As is consistent with the folk- 
wisdom that (for reversible chains) it usually takes twice as long for the separation distance to 
become small as it does for variation distance. 

Remark: A notable non-reversible chain where this relation fails is the riffle-shuffle Markov chain 



( Bayer and Diaconis, 1992 ). What failed is the relation = Vi{s)/{vi ■ Vi), which assumes re- 
versibility. With the correct a^'s the above heuristic reasoning gives the right thresholds for the 
riffle-shuffle chain as well. 

10.3. Lozenge tilings. For the Luby-Randall-Sinclair Markov chain on lozenge tilings of the order 
£ hexagon, we used £'[(A<1>)^] < 0{i) to get a bound on the variance of the height function. While 
there do exist atypical configurations for which ii^[(A<I>)^] is this large, it seems that more often 
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£^[(A$)^] is closer to 0(1). If we substitute this bound on S[(A<5)^] into Lemma ^ we would get 
that the variance in the potential function is 0{£'^), with typical deviations from the mean being 
0{i'^). These heuristic bounds are in fact correct in stationarity. It is known that the variance in 
the total height is ^^/4, and more generally abc{a + 6 + c)/12 for the hexagon with side lengths 
a, b, c, a, b, c pium (1996| ). And as we shall see from Eq. 14 below, in stationarity the variance in <I> 
is abc{a + b + c) / {2tt'^ + o(1)). Thus for the order i hexagon in a stationarity, the potential function 
<I> is typically within 0(£-^) of its expected value 0. If (as seems likely) for each time t, ^{Xt) is 
also typically within 0(£-^) of its expected value, then we would get a mixing time lower bound of 
log(£^/^^)/ log(l/|A|) = (IG/vr^ + o{l))£^log£. Intuitively the lozenge tiling is random when the 
distribution of its average height is close to its stationary distribution. 

Theorem 13. Consider the Luhy- Randall- Sinclair lozenge tiling Markov chain on the hexagon 
with side lengths a,b,c,a,b,c, with the tower moves parallel to the "c" sides. Assuming the second 
largest eigenvalue has multiplicity one, 

$(i)2 c[(a + 6)2-1] sin2(TO/(a + 6)) 



Var[$] ab{a + b + c) I - cosi^TT / {a + b)) 



As 

Proof. The first relation is just Eq. (|12|), which is where we use the multiplicity-one assumption. To 
compute the variance of ^ in stationarity, we let Si denote the total height in column i {—a < i < b), 
and write 

<^=Y,iS.-E[S.])s.n^^ 

i=—a 

Var [$] = 2^ Gov (5i , Sj ) sm ^ sm — — 

a I b a I b 

—a<i,j<b 



and use a formula ( Wilson, 2001 ) for the covariances of the Sj's 

/o ON / -^n -N abc(a + b + c) , ■ ^ -s 

^°"(^-^^) = (" + ^)(^-^) ^a + 6)2((a + 6)2 -1) ^ 

to find (with the help of Maple) that 

r^i f/4 a6c(a + 6 + c) ,^ , , a6c(a + 6 + c) 

1 — cos(7r/(a + 6)) (a + 6)"^ — 1 27r"^ 



To evaluate <I>(i) we write 

a + 6 \ \ ' 1 1 \ ' J I < a + b 



(a + i)6c ■^{b-i)ac 
^(1) = , , sm(7r(a + z)/(a + 6)) + , , sm(7r(a + i)/{a + 6)) 



i=—a 



which with the help of Maple simplifies to 



_ ^ sin(7ra/(a + 6)) ^ 
^ ' 2 1-cos(7r/(a + 6)) ' 



When a = b = c = e, Theorem |T| gives As « 32/(37r2)£2 and Ad « v^l6/(37r3)£. Since 7 = 
7r2/16/^^, we estimate the separation threshold to be (le/Tr^)^^ log(32/(37r2)^2) ^ (32/7r2)^4 j^g^ 
and the variation threshold to be (16/7r2)£^ log^, which matches the intuitive lower bound given 
above. 

As a check of the ^'[l^'l] « y^Vvt y^Var [^>] approximation, for the 3 x 3 x 3 cube E[\^\] = 2.872 
while Y'2/7ryVar[$] = 2.892, an error of about 1%. Even for the 2x2x2 cube, the error between 
E[\<i>\] = 1.307 and = 1.319 is less than 1%. 
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10.4. Lattice paths. We have effectively already computed A^. and approximated for lattice 
paths in the a xb box — just set c = 1 in the above formulas for the ax bx c lozenge tiling region. 

For lattice paths we can give some additional intuition. Consider the lattice path Markov chain 
on a n/2 x n/2 box. In stationarity the height fluctation at a given site near the center of the path 
will be Q{^/n), and the fluctuations in the potential function will be G(n'^/^). Initially the potential 
function is G(n^), and at any given time the fluctations in the potential function are about 0(n^/^) 
about its expected value. We used these facts to obtain the lower bound on the variation mixing 
time of log(n^/n^/^)/log(l/|A|). Intuitively the path is about random when its average height is 
close to its stationary distribution, which would imply that the above lower bound is tight. 



10.5. Permutations. 

Theorem 14. For the random adjacent transposition Markov chain on permutations of order n, 
assuming the second largest eigenvalue has multiplicity n — \, Ag = n — 1. 

Remark: It is not clear to what extent it is a coincidence that A^ is the multiplicity of A. This 
relation does not hold for the tiling or lattice path Markov chains, but there the state spaces do 
not have a group structure. For = n is the multiplicity of the second largest eigenvalue. 

But for the cycle the second largest eigenvalue has multiplicity two for n > 3, while Ag = 2 
only for even n > 3 and Ag = 2 cos(7r/n) for odd n > 3. 



Proof of Theorem \1^. For random adjacent transpositions on permutations, for any given card i 



one can define an eigenvector with eigenvalue A based on the location a ^ (i) of that card: 

fiia) = cos[7T{a-\i^-l/2)/n] . 

There is one linear dependency amongst these eigenvectors fi = 0), and it appears that there 
are no other eigenvectors with eigenvalue A. Note that 

/,./. = („ - 1). ± cos^Mj - l/2)/n) = (n - 1)! t ^ + -"^^''^ ' ^/^V") = „,/2 . 

i=i j=i 

By symmetry considerations fi ■ fj = fi ■ fk when k ^ i ^ j. Since 

= /^- [r^fj^ =y + (n-l)/i-/2 

we have that 

fi ■ fj 



2(n-l) 



when i ^ j- 

As before, to determine Ag we compute the coefficients of the /j's in the eigenbasis decomposition 
of 1|. Since there is a linear relation amongst the /j's there will be a one-parameter family of valid 
sets of coefficients - we just need one such valid set of coefficients. We could be methodical and 
use the Gram-Schmidt procedure to extract n — 1 orthogonal vectors from the /j's and then use 
these to get a valid set of coefficients, but the guess-and-verify method is less messy. Consider the 
function 

n 

^ = J2f^{i)f^. (15) 

i=l 
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We have 



/,.$ = ^ I /.(i) 

2 I n 



2n-V 
By comparison 

fi-h= fi{l) ■ 

Since the dot products with the /j's are the same (up to the constant factor n\n/{2{n — 1))), by 
hnear algebra we conclude that 2(n — l)/(n!n)<I> has the desired coefficients. Next we evaluate this 
eigenfunction at and multiply by — = — n! to obtain Ag: 



n\n ^-^ 

n — 1 " 

= -2- V-cos^fvrfi - l/2)/n) 



n . 

1=1 



= re - 1 . □ 

10.6. Shape of the thresholds. In Figures |^ 0, and ^, where we present numerical data for 
separation and variations distances, we also plot some hypothetical asymptotic curves for the 
separation and variation distances, in particular 

s{t) = 1 - exp(-^^A*) 



d{t) = erf [^AA' 
d{t) = erfiV^AaX') , 

where erf (x) = J^^ e~*^ dt / ^/tt is the error function. For random walk on Zg, the variation distance 
d{t) was shown to take the above form by Diaconis, Graham, and Morrison (1990). The intuition 



for why it should also hold for the Markov chains that we are interested in is essentially the same 
as for their proof for Zg. When X is a random state drawn from the uniform distribution, ^{X) is 
well approximated by a Gaussian. (For Zg, $ is the number of ones.) When t is near the mixing 
time threshold, ^{Xt) should be well approximated by a Gaussian with the same variance as ^{X) 
but with mean $(Xo)A*. The intuition, which was made rigorous for Z2 but appears difficult to 
prove for the other chains, is that <I> is the best test statistic for distinguishing X from Xf. The 
amount by which these two Gaussians fail to overlap gives the asymptotic curve for d{t). The curve 
for d{t) also follows from this heuristic reasoning. 

There does not seem to be any similar intuition for why the asymptotic curve for s{t) should be 
what is given above, other than that it holds for Zg and other high-dimensional product graphs, 
and it appears to be a good fit at least for the tiling Markov chains. 

Remark: There exist Markov chains for which the asymptotic curves for d{t) are not given by the 
above formula. For example, piaconis. Fill, and Pitman (199^ ) analyze the top-to-random shuffle 
and determine the curve for d{t) to be given by an explicit piecewise-analytic formula, which in 
particular is different from the above formula. Nonetheless, the chains we are interested in seem to 
have more in common with random walk on Zg than they do with the top-to-random shuffle, and 
we are confident that the above formulas for d{t) and d{t) are the correct asymptotic shape of the 
transition. 
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10.7. Numerical experiments. Figures |6|, and ^ show numerical data for the convergence 
rates of the three classes of Markov chains considered here. The data was obtained by explicit 
mulitplications of the state transition matrix, and assumes that and 1 are the worst starting 
states for d{t), worst pair of starting states for d{t), and worst start and destination state for s{t). 
For each system size the convergence data was scaled and shifted to make the transitions line up. 
The amount by which to scale and shift was computed a priori using the preceding formulas for A, 
As, and A^; in particular the values for A and Ag are exact and the value for A^ was approximated 



using (13). 

The curves line up quite nicely when d(t), d{t), or s(t) are small, and they line up progressively 
better with increasing system size. This fact, and additional graphs not shown here, indicate that 
the Markov chains have a clean cutoff, and that hence the cutoff phenomenon does indeed occur 
where we expect it. The data looks less good when d{t), d{t), or s{t) are large; this distortion is 
due in part to the fact that for finite system sizes there is a small finite Xmirn while in the idealized 
limit Xmin = —oo. For s{t), x^i^ is about twice as large as for d{t), so this distortion is much less 
pronounced for s{t). For d(t), Xmin is log 2 more negative than for d{t), which is large enough to 
make the curves line up noticeably better for d{t) than for d{t). 

10.8. Monte Carlo experiments. To estimate the coupling time for the three classes of Markov 
chains, one can actually run the Markov chain until the upper and lower configurations coalesce, 
repeat many times, and compare the results for different system sizes. The obvious advantage of 
Monte Carlo over numerical experiments is that one can do much larger system sizes. But even 
with the large system sizes, when comparing different system sizes it is still much better to rescale 
time by 

log - = log 1_ (l_pog(^/„))/(^_l) 

rather than its asymptotic value ^jr? . 

After rescaling time in this manner it becomes quite clear that the coupling time for n/2 x n/2 
lattice paths is about log n/ log 1/ A = 2/7r^re^logn, and the coupling time for permutations is 
logn^/logl/A = 4/7r^n'^ logn. Of course this coupling time estimate for lattice paths is actually 



rigorous thanks to Theorems [llj and |1^. Estimating the correct constant for the tiling Markov 
chain is however much more challenging, and we have not yet succeeded in doing this. 

Surprisingly one can use Fill's algorithm to do similar Monte Carlo experiments to measure the 



separation distance ( Fill, 1998 , sect. 9); we did not do this since we already had the numerical data. 



We are unaware of any similar Monte Carlo method for measuring the variation distance. 

11. Concluding remarks 

• Adding weights to distance functions can be useful when proving mixing time upper bounds. 
While the optimal weighting scheme will be related to an eigenvector of the state transition matrix, 
there is no need to diagonalize the matrix, nor is it even necessary to exhibit a single eigenvector 
to produce an effective weighting scheme that yields good upper bounds. For example, one could 
have used parabolic weights rather than the sinusoidal weights that we did, and still derived mixing 
time upper bounds that are only a constant factor worse than the ones we derived. 

• There are a variety of Markov chains for which a mixing time cutoff phenomenon has been proved. 
In future work on sharp mixing time thresholds, it would be worthwhile to determine whether or 



not the Markov chains exhibit a clean cutoff as defined in § 10.1 
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Figure 6. Data for the Luby- Randall-Sinclair lozenge tiling chain on the hexag( 
with side lengths a, 6, c, a, b, c. 
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